Differences between revisions 5 and 6
Revision 5 as of 2023-01-14 21:40:10
Size: 2281
Comment:
Revision 6 as of 2023-01-14 21:43:20
Size: 2070
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from SASDataStep
Line 20: Line 19:
   STATEMENTS;   STATEMENTS;
Line 28: Line 27:
== Input and Output Tables == == Tables ==
Line 30: Line 29:
It's best practice to generate a new table to store transformations, rather than transforming data in place. The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place.
Line 32: Line 31:
The table named in a data step is always the output table. To set a different table as the input, use a `set` statement. To set a different table as the input, use a `set` statement.
Line 35: Line 34:
data LIBREF.NEWTABLE;
   set LIBREF.OLDTABLE;
data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  S
TATEMENTS;
Line 39: Line 39:

The above code will simply copy data without transforming or filtering anything. This probably isn't all you want to do, however.

SAS Data Step


Data Model

The data model for SAS is:

  • Read a row of data
  • Process statements sequentially
  • On run, write data into the step's LIBREF.TABLE

data LIBREF.TABLE;
  STATEMENTS;
run;


Tables

The table named in a data step is always the output table. By default it is also the input table and data is manipulated in place.

To set a different table as the input, use a set statement.

data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  STATEMENTS;
run;


Transforming Data

Coercing Data Types

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;

    /* try converting alphanumeric to numeric */
    NUM_ID = input(ALNUM_ID, 8.);

    /* left-align text */
    length TEXT_VAR $999.;
    TEXT_VAR = put(TEXT_VAR, $999. -L);

    /* parse timestamps like '01Jan1999' */
    format TIMESTAMP_VAR DATE9.;
    MONTH_VAR = month(TIMESTAMP_VAR);

    /* round numerics to integers */
    format INT_VAR 10.;
run;

The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).


Filtering Data

Subset Cases

To subset the cases of a data table, use where statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    where EXPR;
run;

Alternatively, toggle the output data table with if statements.

data LIBREF.NEWTABLE1;
    set LIBREF.OLDTABLE;
    if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
run;

Subset Variables

To subset the variables of a data table, use keep and drop statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    keep VARLIST1;
    drop VARLIST2;
run;

De-duplication

This is actually best done using a procedure step. See here for details.


CategoryRicottone

SAS/DataStep (last edited 2023-03-30 20:36:31 by DominicRicottone)