Differences between revisions 2 and 4 (spanning 2 versions)
Revision 2 as of 2023-03-30 15:13:50
Size: 1595
Comment:
Revision 4 as of 2023-03-30 20:12:08
Size: 1521
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
A procedure for sorting and de-duplicating datasets. A procedure for sorting and de-duplicating tables.
Line 13: Line 13:
To sort a dataset and write into a new dataset, try: To sort a table and write data into a new table, try:
Line 21: Line 21:
If the '''`out`''' option is not specified, the dataset is sorted in place. If the '''`OUT`''' option is not specified, the dataset is sorted in place.
Line 27: Line 27:
To remove duplicate cases by `VAR`, use the '''`nodupkey`''' option. To remove duplicate cases by `VAR`, use the '''`NODUPKEY`''' option.
Line 39: Line 39:
To preserve duplicate cases, specify an alternate output on the '''`dupout`''' option. Removed cases will instead be moved into this dataset. To preserve duplicate cases, specify an alternate output on the '''`DUPOUT`''' option. Removed cases will instead be moved into this dataset.
Line 56: Line 56:
proc sort data=LIBREF.OLDTABLE
         
out=LIBREF.NEWTABLE
         
nodupkey
         
dupout=LIBREF.REMOVEDCASES
proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES
Line 70: Line 67:
 2. substitute the `nodupkey` option with `noduprecs`  2. substitute the `NODUPKEY` option with `NODUPRECS`
Line 76: Line 73:
If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `nodupkey`. If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `NODUPKEY`.
Line 79: Line 76:
proc sort data=LIBREF.OLDTABLE
         
out=LIBREF.TABLE1;
  by ID descending QUALITY DATE;
proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1;
   by ID descending QUALITY DATE;
Line 84: Line 80:
proc sort data=LIBREF.TABLE1
         
out=LIBREF.NEWTABLE
         
nodupkey
         
dupout=LIBREF.TABLE2
  by ID;
proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2
    by ID;

SAS Sort

A procedure for sorting and de-duplicating tables.


Usage

To sort a table and write data into a new table, try:

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
    by VAR;
run;

If the OUT option is not specified, the dataset is sorted in place.

NoDupKey

To remove duplicate cases by VAR, use the NODUPKEY option.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;

DupOut

To preserve duplicate cases, specify an alternate output on the DUPOUT option. Removed cases will instead be moved into this dataset.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
    by VAR;
run;


Deduplication

Basic

proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES
  by ID;
run;

Perfect Duplicates

To remove perfect duplicates...

  1. use _all_ on the BY statement

  2. substitute the NODUPKEY option with NODUPRECS

Advanced

If there should be a preference between which case is kept, use another SORT procedure before the one specifying NODUPKEY.

proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1;
    by ID descending QUALITY DATE;
run;

proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2
    by ID;
run;


CategoryRicottone

SAS/Sort (last edited 2023-03-30 20:12:08 by DominicRicottone)