Size: 996
Comment: Initial commit
|
Size: 3213
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= SAS Data Model = | = SAS Data Step = <<TableOfContents>> ---- == Data Model == |
Line 4: | Line 12: |
Line 6: | Line 15: |
* On `run`, write data into the step's `LIBREF.TABLE` | * On `run`, write data into the output table |
Line 10: | Line 19: |
STATEMENTS; | STATEMENTS; run; }}} ---- == Tables == The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place. To set a different table as the input, use a `set` statement. {{{ data LIBREF.OUTPUT; set LIBREF.INPUT; STATEMENTS; run; }}} ---- == Transforming Data == === Creating Variables === {{{ data LIBREF.OUTPUT; set LIBREF.INPUT; /* declare numeric variables directly with numeric literals */ foo = 1; /* declare string variables by specifying the length first, then with string literals */ length hello $5 world $5; hello = 'Hello'; world = 'World'; /* the length of a string variable cannot be changed again within the same data step */ length greeting $12; greeting = CAT(hello, world); /* 'Hello World' */ greeting = CATS(hello, world); /* 'HelloWorld' */ greeting = CATX(', ', hello, world); /* 'Hello, World' */ greeting = hello||world; /* 'HelloWorld' */ run; }}} Any valid [[SAS/Expressions|expression]] can be used to create a new variable. === Coercing Data Types === {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; /* coerce string to numeric */ numeric_zip_code = input(string_zip_code, 5.); /* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); /* parse timestamps like '01Jan1999' */ format timestamp DATE9.; month = month(timestamp); /* round numerics to integers */ format test_score 3.; run; }}} The `input` function generates a variable `_ERROR_` by default, flagging cases that could not be formatted. To suppress this variable's creation, use `input(ALNUM_ID, ?? 8.)`. ---- == Filtering Data == === Subset Cases === To subset the cases of a data table, use `where` statements. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; run; }}} Note that `where` can only be used with variables that exist before the data step. `if` statements can be used similarly and also bypass that last restriction. The downsides of this approach are that the operation is more computationally expensive, and there isn't a clear analogy to the `where` statements on procedures. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; NEWVAR=1; if NEWVAR=1; run; }}} Alternatively, toggle the output data table with `if` statements. {{{ data LIBREF.NEWTABLE1; set LIBREF.OLDTABLE; if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2; |
Line 16: | Line 139: |
= Quick Tips = | === Subset Variables === |
Line 18: | Line 141: |
== Copying Tables == | To subset the variables of a data table, use `keep` and `drop` statements. |
Line 22: | Line 145: |
set LIBREF.OLDTABLE; run; }}} ---- == Subsetting Tables == {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; keep VARLIST1; drop VARLIST2; run; }}} ---- == Coercing Data Types == {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; /* try converting alphanumeric to numeric */ NUM_ID = input(ANUM_ID, 8.); /* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); /* parse timestamps like '01Jan1999' */ format TIMESTAMP_VAR DATE9.; MONTH_VAR = month(TIMESTAMP_VAR); /* round numerics to integers */ format INT_VAR 10.; |
set LIBREF.OLDTABLE; keep VARLIST1; drop VARLIST2; |
Line 65: | Line 153: |
=== De-duplication === This is actually best done using the `SORT` procedure. See [[SAS/Sort|here]] for details. |
SAS Data Step
Contents
Data Model
The data model for SAS is:
- Read a row of data
- Process statements sequentially
On run, write data into the output table
data LIBREF.TABLE; STATEMENTS; run;
Tables
The table named in a data step is always the output table. By default it is also the input table and data is manipulated in place.
To set a different table as the input, use a set statement.
data LIBREF.OUTPUT; set LIBREF.INPUT; STATEMENTS; run;
Transforming Data
Creating Variables
data LIBREF.OUTPUT; set LIBREF.INPUT; /* declare numeric variables directly with numeric literals */ foo = 1; /* declare string variables by specifying the length first, then with string literals */ length hello $5 world $5; hello = 'Hello'; world = 'World'; /* the length of a string variable cannot be changed again within the same data step */ length greeting $12; greeting = CAT(hello, world); /* 'Hello World' */ greeting = CATS(hello, world); /* 'HelloWorld' */ greeting = CATX(', ', hello, world); /* 'Hello, World' */ greeting = hello||world; /* 'HelloWorld' */ run;
Any valid expression can be used to create a new variable.
Coercing Data Types
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; /* coerce string to numeric */ numeric_zip_code = input(string_zip_code, 5.); /* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); /* parse timestamps like '01Jan1999' */ format timestamp DATE9.; month = month(timestamp); /* round numerics to integers */ format test_score 3.; run;
The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).
Filtering Data
Subset Cases
To subset the cases of a data table, use where statements.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; run;
Note that where can only be used with variables that exist before the data step.
if statements can be used similarly and also bypass that last restriction. The downsides of this approach are that the operation is more computationally expensive, and there isn't a clear analogy to the where statements on procedures.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; NEWVAR=1; if NEWVAR=1; run;
Alternatively, toggle the output data table with if statements.
data LIBREF.NEWTABLE1; set LIBREF.OLDTABLE; if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2; run;
Subset Variables
To subset the variables of a data table, use keep and drop statements.
data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; keep VARLIST1; drop VARLIST2; run;
De-duplication
This is actually best done using the SORT procedure. See here for details.