Size: 1172
Comment:
|
Size: 2033
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= SAS Data Model = | = SAS Data Step = <<TableOfContents>> ---- == Data Model == |
Line 4: | Line 12: |
Line 14: | Line 23: |
---- | |
Line 16: | Line 26: |
= Quick Tips = | |
Line 18: | Line 27: |
== Copying Tables == | == Input and Output Tables == It's best practice to generate a new table to store transformations, rather than transforming data in place. The table named in a data step is always the output table. To set a different table as the input, use a `set` statement. |
Line 26: | Line 39: |
---- == Subsetting Tables == {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; keep VARLIST1; drop VARLIST2; run; }}} |
The above code will simply copy data without transforming or filtering anything. This probably isn't all you want to do, however. |
Line 41: | Line 43: |
== Coercing Data Types == | == Transforming Data == === Coercing Data Types === |
Line 48: | Line 54: |
NUM_ID = input(ANUM_ID, 8.); | NUM_ID = input(ALNUM_ID, 8.); |
Line 63: | Line 69: |
The `input` function generates a variable `_ERROR_` by default, flagging cases that could not be formatted. To suppress this variable's creation, use `input(ANUM_ID, ?? 8.)`. | The `input` function generates a variable `_ERROR_` by default, flagging cases that could not be formatted. To suppress this variable's creation, use `input(ALNUM_ID, ?? 8.)`. ---- == Filtering Data == === Subset Cases === To subset the cases of a data table, use `where` statements. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; run; }}} === Subset Variables === To subset the variables of a data table, use `keep` and `drop` statements. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; keep VARLIST1; drop VARLIST2; run; }}} === De-duplication === This is actually best done using a procedure step. See [[CrosswalkDuplicates#SAS|here]] for details. |
SAS Data Step
Contents
Data Model
The data model for SAS is:
- Read a row of data
- Process statements sequentially
On run, write data into the step's LIBREF.TABLE
data LIBREF.TABLE; STATEMENTS; run;
Input and Output Tables
It's best practice to generate a new table to store transformations, rather than transforming data in place.
The table named in a data step is always the output table. To set a different table as the input, use a set statement.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; run;
The above code will simply copy data without transforming or filtering anything. This probably isn't all you want to do, however.
Transforming Data
Coercing Data Types
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; /* try converting alphanumeric to numeric */ NUM_ID = input(ALNUM_ID, 8.); /* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); /* parse timestamps like '01Jan1999' */ format TIMESTAMP_VAR DATE9.; MONTH_VAR = month(TIMESTAMP_VAR); /* round numerics to integers */ format INT_VAR 10.; run;
The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).
Filtering Data
Subset Cases
To subset the cases of a data table, use where statements.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; run;
Subset Variables
To subset the variables of a data table, use keep and drop statements.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; keep VARLIST1; drop VARLIST2; run;
De-duplication
This is actually best done using a procedure step. See here for details.