SAS Data Step


Data Model

The data model for SAS is:

data LIBREF.TABLE;
    STATEMENTS;
run;


Input and Output Tables

It's best practice to generate a new table to store transformations, rather than transforming data in place.

The table named in a data step is always the output table. To set a different table as the input, use a set statement.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
run;

The above code will simply copy data without transforming or filtering anything. This probably isn't all you want to do, however.


Transforming Data

Coercing Data Types

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;

    /* try converting alphanumeric to numeric */
    NUM_ID = input(ALNUM_ID, 8.);

    /* left-align text */
    length TEXT_VAR $999.;
    TEXT_VAR = put(TEXT_VAR, $999. -L);

    /* parse timestamps like '01Jan1999' */
    format TIMESTAMP_VAR DATE9.;
    MONTH_VAR = month(TIMESTAMP_VAR);

    /* round numerics to integers */
    format INT_VAR 10.;
run;

The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).


Filtering Data

Subset Cases

To subset the cases of a data table, use where statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    where EXPR;
run;

Alternatively, toggle the output data table with if statements.

data LIBREF.NEWTABLE1;
    set LIBREF.OLDTABLE;
    if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
run;

Subset Variables

To subset the variables of a data table, use keep and drop statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    keep VARLIST1;
    drop VARLIST2;
run;

De-duplication

This is actually best done using a procedure step. See here for details.


CategoryRicottone