Differences between revisions 1 and 6 (spanning 5 versions)
Revision 1 as of 2019-12-06 02:09:48
Size: 996
Comment: Initial commit
Revision 6 as of 2023-01-14 21:43:20
Size: 2070
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= SAS Data Model = = SAS Data Step =

<<TableOfContents>>

----



== Data Model ==
Line 4: Line 12:
Line 10: Line 19:
    STATEMENTS;
run;
}}}



= Quick Tips =

== Copying Tables ==

{{{
data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
  STATEMENTS;
Line 28: Line 25:
== Subsetting Tables ==

== Tables ==

The table named in a data step is always the ''output table''. By default it is also the ''input table'' and data is manipulated in place.

To set a different table as the input, use a `set` statement.
Line 31: Line 34:
data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    where EXPR;
    keep VARLIST1;
    drop VARLIST2;
data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  STATEMENTS;
Line 41: Line 42:
== Coercing Data Types ==

== Transforming Data ==

=
== Coercing Data Types ===
Line 48: Line 53:
    NUM_ID = input(ANUM_ID, 8.);     NUM_ID = input(ALNUM_ID, 8.);
Line 63: Line 68:
The `input` function generates a variable `_ERROR_` by default, flagging cases that could not be formatted. To suppress this variable's creation, use `input(ALNUM_ID, ?? 8.)`.

----



== Filtering Data ==

=== Subset Cases ===

To subset the cases of a data table, use `where` statements.

{{{
data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    where EXPR;
run;
}}}

Alternatively, toggle the output data table with `if` statements.

{{{
data LIBREF.NEWTABLE1;
    set LIBREF.OLDTABLE;
    if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
run;
}}}



=== Subset Variables ===

To subset the variables of a data table, use `keep` and `drop` statements.

{{{
data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    keep VARLIST1;
    drop VARLIST2;
run;
}}}



=== De-duplication ===

This is actually best done using a procedure step. See [[CrosswalkDuplicates#SAS|here]] for details.

SAS Data Step


Data Model

The data model for SAS is:

  • Read a row of data
  • Process statements sequentially
  • On run, write data into the step's LIBREF.TABLE

data LIBREF.TABLE;
  STATEMENTS;
run;


Tables

The table named in a data step is always the output table. By default it is also the input table and data is manipulated in place.

To set a different table as the input, use a set statement.

data LIBREF.OUTPUT;
  set LIBREF.INPUT;
  STATEMENTS;
run;


Transforming Data

Coercing Data Types

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;

    /* try converting alphanumeric to numeric */
    NUM_ID = input(ALNUM_ID, 8.);

    /* left-align text */
    length TEXT_VAR $999.;
    TEXT_VAR = put(TEXT_VAR, $999. -L);

    /* parse timestamps like '01Jan1999' */
    format TIMESTAMP_VAR DATE9.;
    MONTH_VAR = month(TIMESTAMP_VAR);

    /* round numerics to integers */
    format INT_VAR 10.;
run;

The input function generates a variable _ERROR_ by default, flagging cases that could not be formatted. To suppress this variable's creation, use input(ALNUM_ID, ?? 8.).


Filtering Data

Subset Cases

To subset the cases of a data table, use where statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    where EXPR;
run;

Alternatively, toggle the output data table with if statements.

data LIBREF.NEWTABLE1;
    set LIBREF.OLDTABLE;
    if EXPR then output LIBREF.NEWTABLE1; else output LIBREF.NEWTABLE2;
run;

Subset Variables

To subset the variables of a data table, use keep and drop statements.

data LIBREF.NEWTABLE;
    set LIBREF.OLDTABLE;
    keep VARLIST1;
    drop VARLIST2;
run;

De-duplication

This is actually best done using a procedure step. See here for details.


CategoryRicottone

SAS/DataStep (last edited 2023-03-30 20:36:31 by DominicRicottone)