SAS Sort

A procedure for sorting and de-duplicating datasets.


Usage

To sort a dataset and write into a new dataset, try:

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
    by VAR;
run;

If the out option is not specified, the dataset is sorted in place.

NoDupKey

To remove duplicate cases by VAR, use the nodupkey option.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;

DupOut

To preserve duplicate cases, specify an alternate output on the dupout option. Removed cases will instead be moved into this dataset.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
    by VAR;
run;


Deduplication

Basic

proc sort data=LIBREF.OLDTABLE
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.REMOVEDCASES
  by ID;
run;

Perfect Duplicates

To remove perfect duplicates...

  1. use _all_ on the BY statement

  2. substitute the nodupkey option with noduprecs

Advanced

If there should be a preference between which case is kept, use another SORT procedure before the one specifying nodupkey.

proc sort data=LIBREF.OLDTABLE
          out=LIBREF.TABLE1;
  by ID descending QUALITY DATE;
run;

proc sort data=LIBREF.TABLE1
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.TABLE2
  by ID;
run;


CategoryRicottone