|
Size: 996
Comment: Initial commit
|
Size: 2070
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| = SAS Data Model = | = SAS Data Step = <<TableOfContents>> ---- == Data Model == |
| Line 4: | Line 12: |
| Line 10: | Line 19: |
| STATEMENTS; run; }}} = Quick Tips = == Copying Tables == {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; |
STATEMENTS; |
| Line 28: | Line 25: |
| == Subsetting Tables == | == Tables == The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place. To set a different table as the input, use a `set` statement. |
| Line 31: | Line 34: |
| data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; keep VARLIST1; drop VARLIST2; |
data LIBREF.OUTPUT; set LIBREF.INPUT; STATEMENTS; |
| Line 41: | Line 42: |
| == Coercing Data Types == | == Transforming Data == === Coercing Data Types === |
| Line 48: | Line 53: |
| NUM_ID = input(ANUM_ID, 8.); | NUM_ID = input(ALNUM_ID, 8.); |
| Line 63: | Line 68: |
| The `input` function generates a variable `_ERROR_` by default, flagging cases that could not be formatted. To suppress this variable's creation, use `input(ALNUM_ID, ?? 8.)`. ---- == Filtering Data == === Subset Cases === To subset the cases of a data table, use `where` statements. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; run; }}} Alternatively, toggle the output data table with `if` statements. {{{ data LIBREF.NEWTABLE1; set LIBREF.OLDTABLE; if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2; run; }}} === Subset Variables === To subset the variables of a data table, use `keep` and `drop` statements. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; keep VARLIST1; drop VARLIST2; run; }}} === De-duplication === This is actually best done using a procedure step. See [[CrosswalkDuplicates#SAS|here]] for details. |
SAS Data Step
Contents
Data Model
The data model for SAS is:
- Read a row of data
- Process statements sequentially
On run, write data into the step's LIBREF.TABLE
data LIBREF.TABLE; STATEMENTS; run;
Tables
The table named in a data step is always the output table. By default it is also the input table and data is manipulated in place.
To set a different table as the input, use a set statement.
data LIBREF.OUTPUT; set LIBREF.INPUT; STATEMENTS; run;
Transforming Data
Coercing Data Types
data LIBREF.NEWTABLE;
set LIBREF.OLDTABLE;
/* try converting alphanumeric to numeric */
NUM_ID = input(ALNUM_ID, 8.);
/* left-align text */
length TEXT_VAR $999.;
TEXT_VAR = put(TEXT_VAR, $999. -L);
/* parse timestamps like '01Jan1999' */
format TIMESTAMP_VAR DATE9.;
MONTH_VAR = month(TIMESTAMP_VAR);
/* round numerics to integers */
format INT_VAR 10.;
run;The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).
Filtering Data
Subset Cases
To subset the cases of a data table, use where statements.
data LIBREF.NEWTABLE;
set LIBREF.OLDTABLE;
where EXPR;
run;Alternatively, toggle the output data table with if statements.
data LIBREF.NEWTABLE1;
set LIBREF.OLDTABLE;
if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
run;
Subset Variables
To subset the variables of a data table, use keep and drop statements.
data LIBREF.NEWTABLE;
set LIBREF.OLDTABLE;
keep VARLIST1;
drop VARLIST2;
run;
De-duplication
This is actually best done using a procedure step. See here for details.
