Differences between revisions 2 and 3

SAS Sort

A procedure for sorting and de-duplicating tables.

Contents

SAS Sort
1. Usage
  1. NoDupKey
  2. DupOut
2. Deduplication

Usage

To sort a dataset and write into a new dataset, try:

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
    by VAR;
run;

If the OUT option is not specified, the dataset is sorted in place.

NoDupKey

To remove duplicate cases by VAR, use the NODUPKEY option.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;

DupOut

To preserve duplicate cases, specify an alternate output on the DUPOUT option. Removed cases will instead be moved into this dataset.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
    by VAR;
run;

Deduplication

Basic

proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES
  by ID;
run;

Perfect Duplicates

To remove perfect duplicates...

use _all_ on the BY statement
substitute the NODUPKEY option with NODUPRECS

Advanced

If there should be a preference between which case is kept, use another SORT procedure before the one specifying NODUPKEY.

proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1;
    by ID descending QUALITY DATE;
run;

proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2
    by ID;
run;

CategoryRicottone

-  ⇤ ← Revision 2 as of 2023-03-30 15:13:50 → 
  Size: 1595
  Editor: DominicRicottone
  Comment:
+   ← Revision 3 as of 2023-03-30 20:11:48 → ⇥
  Size: 1520
  Editor: DominicRicottone
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-A procedure for sorting and de-duplicating datasets.
+A procedure for sorting and de-duplicating tables.
 Line 21:
-If the '''`out`''' option is not specified, the dataset is sorted in place.
+If the '''`OUT`''' option is not specified, the dataset is sorted in place.
 Line 27:
-To remove duplicate cases by `VAR`, use the '''`nodupkey`''' option.
+To remove duplicate cases by `VAR`, use the '''`NODUPKEY`''' option.
 Line 39:
-To preserve duplicate cases, specify an alternate output on the '''`dupout`''' option. Removed cases will instead be moved into this dataset.
+To preserve duplicate cases, specify an alternate output on the '''`DUPOUT`''' option. Removed cases will instead be moved into this dataset.
 Line 56:
-proc sort data=LIBREF.OLDTABLE
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.REMOVEDCASES
+proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES
-Line 70:
+Line 67:
-. substitute the `nodupkey` option with `noduprecs`
+. substitute the `NODUPKEY` option with `NODUPRECS`
-Line 76:
+Line 73:
-If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `nodupkey`.
+If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `NODUPKEY`.
-Line 79:
+Line 76:
-proc sort data=LIBREF.OLDTABLE
          out=LIBREF.TABLE1;
  by ID descending QUALITY DATE;
+proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1;
    by ID descending QUALITY DATE;
-Line 84:
+Line 80:
-proc sort data=LIBREF.TABLE1
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.TABLE2
  by ID;
+proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2
    by ID;

Diff for "SAS/Sort"