SAS Sort
A procedure for sorting and de-duplicating datasets.
Usage
To sort a dataset and write into a new dataset, try:
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
by VAR;
run;If the out option is not specified, the dataset is sorted in place.
NoDupKey
To remove duplicate cases by VAR, use the nodupkey option.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
by VAR;
run;
DupOut
To preserve duplicate cases, specify an alternate output on the dupout option. Removed cases will instead be moved into this dataset.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
by VAR;
run;
Deduplication
Basic
proc sort data=LIBREF.OLDTABLE
out=LIBREF.NEWTABLE
nodupkey
dupout=LIBREF.REMOVEDCASES
by ID;
run;
Perfect Duplicates
To remove perfect duplicates...
use _all_ on the BY statement
substitute the nodupkey option with noduprecs
Advanced
If there should be a preference between which case is kept, use another SORT procedure before the one specifying nodupkey.
proc sort data=LIBREF.OLDTABLE
out=LIBREF.TABLE1;
by ID descending QUALITY DATE;
run;
proc sort data=LIBREF.TABLE1
out=LIBREF.NEWTABLE
nodupkey
dupout=LIBREF.TABLE2
by ID;
run;