Differences between revisions 5 and 7 (spanning 2 versions)
Revision 5 as of 2023-01-14 21:40:10
Size: 2281
Comment:
Revision 7 as of 2023-01-14 21:58:42
Size: 2804
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from SASDataStep
Line 16: Line 15:
 * On `run`, write data into the step's `LIBREF.TABLE`  * On `run`, write data into the output table
Line 20: Line 19:
   STATEMENTS;   STATEMENTS;
Line 28: Line 27:
== Input and Output Tables == == Tables ==
Line 30: Line 29:
It's best practice to generate a new table to store transformations, rather than transforming data in place. The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place.
Line 32: Line 31:
The table named in a data step is always the output table. To set a different table as the input, use a `set` statement. To set a different table as the input, use a `set` statement.
Line 35: Line 34:
data LIBREF.NEWTABLE;
   set LIBREF.OLDTABLE;
data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  S
TATEMENTS;
Line 39: Line 39:

The above code will simply copy data without transforming or filtering anything. This probably isn't all you want to do, however.
Line 48: Line 46:


=== Creating Variables ===

{{{
data LIBREF.OUTPUT;
  set LIBREF.INPUT;

  /* declare numeric variables directly with numeric literals */
  foo = 1;

  /* declare string variables by specifying the length first, then with string literals */
  length hello $5 world $5;
  hello = 'Hello';
  world = 'World';

  /* the length of a string variable cannot be changed again within the same data step */
  length greeting $12;
  greeting = CAT(hello, world); /* 'Hello World' */
  greeting = CATS(hello, world); /* 'HelloWorld' */
  greeting = CATX(', ', hello, world); /* 'Hello, World' */
  greeting = hello||world; /* 'HelloWorld' */
run;
}}}

Any valid [[SAS/Expression|expression]] can be used to create a new variable.


Line 52: Line 79:
   set LIBREF.OLDTABLE;   set LIBREF.OLDTABLE;
Line 54: Line 81:
   /* try converting alphanumeric to numeric */
    NUM_ID = input(ALNUM_ID, 8.);
  /* coerce string to numeric */
  numeric_zip_code = input(string_zip_code, 5.);
Line 57: Line 84:
   /* left-align text */
    length TEXT_VAR $999.;
   TEXT_VAR = put(TEXT_VAR, $999. -L);
  /* left-align text */
  length TEXT_VAR $999.;
  TEXT_VAR = put(TEXT_VAR, $999. -L);
Line 61: Line 88:
   /* parse timestamps like '01Jan1999' */
    format TIMESTAMP_VAR DATE9.;
    MONTH_VAR = month(TIMESTAMP_VAR);
  /* parse timestamps like '01Jan1999' */
  format timestamp DATE9.;
  month = month(timestamp);
Line 65: Line 92:
   /* round numerics to integers */
    format INT_VAR 10.;
  /* round numerics to integers */
  format test_score 3.;
Line 116: Line 143:
This is actually best done using a procedure step. See [[CrosswalkDuplicates#SAS|here]] for details. This is actually best done using the `SORT` procedure. See [[SAS/Sort|here]] for details.

SAS Data Step


Data Model

The data model for SAS is:

  • Read a row of data
  • Process statements sequentially
  • On run, write data into the output table

data LIBREF.TABLE;
  STATEMENTS;
run;


Tables

The table named in a data step is always the output table. By default it is also the input table and data is manipulated in place.

To set a different table as the input, use a set statement.

data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  STATEMENTS;
run;


Transforming Data

Creating Variables

data LIBREF.OUTPUT;
  set LIBREF.INPUT;

  /* declare numeric variables directly with numeric literals */
  foo = 1;

  /* declare string variables by specifying the length first, then with string literals */
  length hello $5 world $5;
  hello = 'Hello';
  world = 'World';

  /* the length of a string variable cannot be changed again within the same data step */
  length greeting $12;
  greeting = CAT(hello, world);        /* 'Hello World'  */
  greeting = CATS(hello, world);       /* 'HelloWorld'   */
  greeting = CATX(', ', hello, world); /* 'Hello, World' */
  greeting = hello||world;             /* 'HelloWorld'   */
run;

Any valid expression can be used to create a new variable.

Coercing Data Types

data LIBREF.NEWTABLE;
  set LIBREF.OLDTABLE;

  /* coerce string to numeric */
  numeric_zip_code = input(string_zip_code, 5.);

  /* left-align text */
  length TEXT_VAR $999.;
  TEXT_VAR = put(TEXT_VAR, $999. -L);

  /* parse timestamps like '01Jan1999' */
  format timestamp DATE9.;
  month = month(timestamp);

  /* round numerics to integers */
  format test_score 3.;
run;

The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).


Filtering Data

Subset Cases

To subset the cases of a data table, use where statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    where EXPR;
run;

Alternatively, toggle the output data table with if statements.

data LIBREF.NEWTABLE1;
    set LIBREF.OLDTABLE;
    if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
run;

Subset Variables

To subset the variables of a data table, use keep and drop statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    keep VARLIST1;
    drop VARLIST2;
run;

De-duplication

This is actually best done using the SORT procedure. See here for details.


CategoryRicottone

SAS/DataStep (last edited 2023-03-30 20:36:31 by DominicRicottone)