= SAS Data Step = In SAS, data is manipulated in '''data steps'''. <> ---- == Data Model == The data model for SAS is: * Read a row of data * Process statements sequentially * On `run`, write data into the output table {{{ data LIBREF.TABLE; STATEMENTS; run; }}} ---- == Tables == The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place. To set a different table as the input, use a `set` statement. {{{ data LIBREF.OUTPUT; set LIBREF.INPUT; STATEMENTS; run; }}} ---- == Data Input == See [[SAS/ReadingData|here]]. ---- == Transforming Data == === Creating Variables === {{{ data LIBREF.OUTPUT; set LIBREF.INPUT; /* declare numeric variables directly with numeric literals */ foo = 1; /* declare string variables by specifying the length first, then with string literals */ length hello $5 world $5; hello = 'Hello'; world = 'World'; /* the length of a string variable cannot be changed again within the same data step */ length greeting $12; greeting = CAT(hello, world); /* 'Hello World' */ greeting = CATS(hello, world); /* 'HelloWorld' */ greeting = CATX(', ', hello, world); /* 'Hello, World' */ greeting = hello||world; /* 'HelloWorld' */ run; }}} Any valid [[SAS/Expressions|expression]] can be used to create a new variable. === Coercing Data Types === {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; /* coerce string to numeric */ numeric_zip_code = input(string_zip_code, 5.); /* left-align text */ length TEXT_VAR $999.; TEXT_VAR = put(TEXT_VAR, $999. -L); /* parse timestamps like '01Jan1999' */ format timestamp DATE9.; month = month(timestamp); /* round numerics to integers */ format test_score 3.; run; }}} The `input` function generates a variable `_ERROR_` by default, flagging cases that could not be formatted. To suppress this variable's creation, use `input(ALNUM_ID, ?? 8.)`. ---- == Joining Data == See [[SAS/JoiningData|here]]. ---- == Filtering Data == === Subset Cases === To subset the cases of a data table, use `where` statements. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; where EXPR; run; }}} Note that `where` can only be used with variables that exist before the data step. `if` statements can be used similarly and also bypass that last restriction. The downsides of this approach are that the operation is more computationally expensive, and there isn't a clear analogy to the `where` statements on procedures. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; NEWVAR=1; if NEWVAR=1; run; }}} Alternatively, toggle the output data table with `if` statements. {{{ data LIBREF.NEWTABLE1; set LIBREF.OLDTABLE; if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2; run; }}} === Subset Variables === To subset the variables of a data table, use `keep` and `drop` statements. {{{ data LIBREF.NEWTABLE; set LIBREF.OLDTABLE; keep VARLIST1; drop VARLIST2; run; }}} === De-duplication === This is actually best done using the `SORT` procedure. See [[SAS/Sort|here]] for details. ---- CategoryRicottone