Size: 2281
Comment:
|
Size: 2804
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from SASDataStep | |
Line 16: | Line 15: |
* On `run`, write data into the step's `LIBREF.TABLE` | * On `run`, write data into the output table |
Line 20: | Line 19: |
STATEMENTS; | STATEMENTS; |
Line 28: | Line 27: |
== Input and Output Tables == | == Tables == |
Line 30: | Line 29: |
It's best practice to generate a new table to store transformations, rather than transforming data in place. | The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place. |
Line 32: | Line 31: |
The table named in a data step is always the output table. To set a different table as the input, use a `set` statement. | To set a different table as the input, use a `set` statement. |
Line 35: | Line 34: |
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; |
data LIBREF.OUTPUT; set LIBREF.INPUT; STATEMENTS; |
Line 39: | Line 39: |
The above code will simply copy data without transforming or filtering anything. This probably isn't all you want to do, however. |
|
Line 48: | Line 46: |
=== Creating Variables === {{{ data LIBREF.OUTPUT; set LIBREF.INPUT; /* declare numeric variables directly with numeric literals */ foo = 1; /* declare string variables by specifying the length first, then with string literals */ length hello $5 world $5; hello = 'Hello'; world = 'World'; /* the length of a string variable cannot be changed again within the same data step */ length greeting $12; greeting = CAT(hello, world); /* 'Hello World' */ greeting = CATS(hello, world); /* 'HelloWorld' */ greeting = CATX(', ', hello, world); /* 'Hello, World' */ greeting = hello||world; /* 'HelloWorld' */ run; }}} Any valid [[SAS/Expression|expression]] can be used to create a new variable. |
|
Line 52: | Line 79: |
set LIBREF.OLDTABLE; | set LIBREF.OLDTABLE; |
Line 54: | Line 81: |
/* try converting alphanumeric to numeric */ NUM_ID = input(ALNUM_ID, 8.); |
/* coerce string to numeric */ numeric_zip_code = input(string_zip_code, 5.); |
Line 57: | Line 84: |
/* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); |
/* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); |
Line 61: | Line 88: |
/* parse timestamps like '01Jan1999' */ format TIMESTAMP_VAR DATE9.; MONTH_VAR = month(TIMESTAMP_VAR); |
/* parse timestamps like '01Jan1999' */ format timestamp DATE9.; month = month(timestamp); |
Line 65: | Line 92: |
/* round numerics to integers */ format INT_VAR 10.; |
/* round numerics to integers */ format test_score 3.; |
Line 116: | Line 143: |
This is actually best done using a procedure step. See [[CrosswalkDuplicates#SAS|here]] for details. | This is actually best done using the `SORT` procedure. See [[SAS/Sort|here]] for details. |
SAS Data Step
Contents
Data Model
The data model for SAS is:
- Read a row of data
- Process statements sequentially
On run, write data into the output table
data LIBREF.TABLE; STATEMENTS; run;
Tables
The table named in a data step is always the output table. By default it is also the input table and data is manipulated in place.
To set a different table as the input, use a set statement.
data LIBREF.OUTPUT; set LIBREF.INPUT; STATEMENTS; run;
Transforming Data
Creating Variables
data LIBREF.OUTPUT; set LIBREF.INPUT; /* declare numeric variables directly with numeric literals */ foo = 1; /* declare string variables by specifying the length first, then with string literals */ length hello $5 world $5; hello = 'Hello'; world = 'World'; /* the length of a string variable cannot be changed again within the same data step */ length greeting $12; greeting = CAT(hello, world); /* 'Hello World' */ greeting = CATS(hello, world); /* 'HelloWorld' */ greeting = CATX(', ', hello, world); /* 'Hello, World' */ greeting = hello||world; /* 'HelloWorld' */ run;
Any valid expression can be used to create a new variable.
Coercing Data Types
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; /* coerce string to numeric */ numeric_zip_code = input(string_zip_code, 5.); /* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); /* parse timestamps like '01Jan1999' */ format timestamp DATE9.; month = month(timestamp); /* round numerics to integers */ format test_score 3.; run;
The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).
Filtering Data
Subset Cases
To subset the cases of a data table, use where statements.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; run;
Alternatively, toggle the output data table with if statements.
data LIBREF.NEWTABLE1; set LIBREF.OLDTABLE; if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2; run;
Subset Variables
To subset the variables of a data table, use keep and drop statements.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; keep VARLIST1; drop VARLIST2; run;
De-duplication
This is actually best done using the SORT procedure. See here for details.