Differences between revisions 5 and 12 (spanning 7 versions)
Revision 5 as of 2023-01-14 21:40:10
Size: 2281
Comment:
Revision 12 as of 2023-03-30 20:36:31
Size: 3397
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from SASDataStep
Line 3: Line 2:

In SAS, data is manipulated in '''data steps'''.
Line 16: Line 17:
 * On `run`, write data into the step's `LIBREF.TABLE`  * On `run`, write data into the output table
Line 20: Line 21:
   STATEMENTS;   STATEMENTS;
Line 28: Line 29:
== Input and Output Tables == == Tables ==
Line 30: Line 31:
It's best practice to generate a new table to store transformations, rather than transforming data in place. The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place.
Line 32: Line 33:
The table named in a data step is always the output table. To set a different table as the input, use a `set` statement. To set a different table as the input, use a `set` statement.
Line 35: Line 36:
data LIBREF.NEWTABLE;
   set LIBREF.OLDTABLE;
data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  S
TATEMENTS;
Line 40: Line 42:
The above code will simply copy data without transforming or filtering anything. This probably isn't all you want to do, however. ----



== Data Input ==

See [[SAS/ReadingData|here]].
Line 48: Line 56:


=== Creating Variables ===

{{{
data LIBREF.OUTPUT;
  set LIBREF.INPUT;

  /* declare numeric variables directly with numeric literals */
  foo = 1;

  /* declare string variables by specifying the length first, then with string literals */
  length hello $5 world $5;
  hello = 'Hello';
  world = 'World';

  /* the length of a string variable cannot be changed again within the same data step */
  length greeting $12;
  greeting = CAT(hello, world); /* 'Hello World' */
  greeting = CATS(hello, world); /* 'HelloWorld' */
  greeting = CATX(', ', hello, world); /* 'Hello, World' */
  greeting = hello||world; /* 'HelloWorld' */
run;
}}}

Any valid [[SAS/Expressions|expression]] can be used to create a new variable.


Line 52: Line 89:
   set LIBREF.OLDTABLE;   set LIBREF.OLDTABLE;
Line 54: Line 91:
   /* try converting alphanumeric to numeric */
    NUM_ID = input(ALNUM_ID, 8.);
  /* coerce string to numeric */
  numeric_zip_code = input(string_zip_code, 5.);
Line 57: Line 94:
   /* left-align text */
    length TEXT_VAR $999.;
   TEXT_VAR = put(TEXT_VAR, $999. -L);
  /* left-align text */
  length TEXT_VAR $999.;
  TEXT_VAR = put(TEXT_VAR, $999. -L);
Line 61: Line 98:
   /* parse timestamps like '01Jan1999' */
    format TIMESTAMP_VAR DATE9.;
    MONTH_VAR = month(TIMESTAMP_VAR);
  /* parse timestamps like '01Jan1999' */
  format timestamp DATE9.;
  month = month(timestamp);
Line 65: Line 102:
   /* round numerics to integers */
    format INT_VAR 10.;
  /* round numerics to integers */
  format test_score 3.;
Line 71: Line 108:

----



== Joining Data ==

See [[SAS/JoiningData|here]].
Line 84: Line 129:
    set LIBREF.OLDTABLE;
    where EXPR;
  set LIBREF.OLDTABLE;
  where EXPR;
run;
}}}

Note that `where` can only be used with variables that exist before the data step.

`if` statements can be used similarly and also bypass that last restriction. The downsides of this approach are that the operation is more computationally expensive, and there isn't a clear analogy to the `where` statements on procedures.

{{{
data LIBREF.NEWTABLE;
  set LIBREF.OLDTABLE;
  NEWVAR=1;
  if NEWVAR=1;
Line 93: Line 150:
   set LIBREF.OLDTABLE;
   if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
  set LIBREF.OLDTABLE;
  if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
Line 106: Line 163:
   set LIBREF.OLDTABLE;
    keep VARLIST1;
    drop VARLIST2;
  set LIBREF.OLDTABLE;
  keep VARLIST1;
  drop VARLIST2;
Line 116: Line 173:
This is actually best done using a procedure step. See [[CrosswalkDuplicates#SAS|here]] for details. This is actually best done using the `SORT` procedure. See [[SAS/Sort|here]] for details.

SAS Data Step

In SAS, data is manipulated in data steps.


Data Model

The data model for SAS is:

  • Read a row of data
  • Process statements sequentially
  • On run, write data into the output table

data LIBREF.TABLE;
  STATEMENTS;
run;


Tables

The table named in a data step is always the output table. By default it is also the input table and data is manipulated in place.

To set a different table as the input, use a set statement.

data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  STATEMENTS;
run;


Data Input

See here.


Transforming Data

Creating Variables

data LIBREF.OUTPUT;
  set LIBREF.INPUT;

  /* declare numeric variables directly with numeric literals */
  foo = 1;

  /* declare string variables by specifying the length first, then with string literals */
  length hello $5 world $5;
  hello = 'Hello';
  world = 'World';

  /* the length of a string variable cannot be changed again within the same data step */
  length greeting $12;
  greeting = CAT(hello, world);        /* 'Hello World'  */
  greeting = CATS(hello, world);       /* 'HelloWorld'   */
  greeting = CATX(', ', hello, world); /* 'Hello, World' */
  greeting = hello||world;             /* 'HelloWorld'   */
run;

Any valid expression can be used to create a new variable.

Coercing Data Types

data LIBREF.NEWTABLE;
  set LIBREF.OLDTABLE;

  /* coerce string to numeric */
  numeric_zip_code = input(string_zip_code, 5.);

  /* left-align text */
  length TEXT_VAR $999.;
  TEXT_VAR = put(TEXT_VAR, $999. -L);

  /* parse timestamps like '01Jan1999' */
  format timestamp DATE9.;
  month = month(timestamp);

  /* round numerics to integers */
  format test_score 3.;
run;

The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).


Joining Data

See here.


Filtering Data

Subset Cases

To subset the cases of a data table, use where statements.

data LIBREF.NEWTABLE;
  set LIBREF.OLDTABLE;
  where EXPR;
run;

Note that where can only be used with variables that exist before the data step.

if statements can be used similarly and also bypass that last restriction. The downsides of this approach are that the operation is more computationally expensive, and there isn't a clear analogy to the where statements on procedures.

data LIBREF.NEWTABLE;
  set LIBREF.OLDTABLE;
  NEWVAR=1;
  if NEWVAR=1;
run;

Alternatively, toggle the output data table with if statements.

data LIBREF.NEWTABLE1;
  set LIBREF.OLDTABLE;
  if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
run;

Subset Variables

To subset the variables of a data table, use keep and drop statements.

data LIBREF.NEWTABLE;
  set LIBREF.OLDTABLE;
  keep VARLIST1;
  drop VARLIST2;
run;

De-duplication

This is actually best done using the SORT procedure. See here for details.


CategoryRicottone

SAS/DataStep (last edited 2023-03-30 20:36:31 by DominicRicottone)