Differences between revisions 1 and 2

SAS Sort

A procedure for sorting and de-duplicating datasets.

Contents

SAS Sort
1. Usage
  1. NoDupKey
  2. DupOut
2. Deduplication

Usage

To sort a dataset and write into a new dataset, try:

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
    by VAR;
run;

If the out option is not specified, the dataset is sorted in place.

NoDupKey

To remove duplicate cases by VAR, use the nodupkey option.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;

DupOut

To preserve duplicate cases, specify an alternate output on the dupout option. Removed cases will instead be moved into this dataset.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
    by VAR;
run;

Deduplication

Basic

proc sort data=LIBREF.OLDTABLE
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.REMOVEDCASES
  by ID;
run;

Perfect Duplicates

To remove perfect duplicates...

use _all_ on the BY statement
substitute the nodupkey option with noduprecs

Advanced

If there should be a preference between which case is kept, use another SORT procedure before the one specifying nodupkey.

proc sort data=LIBREF.OLDTABLE
          out=LIBREF.TABLE1;
  by ID descending QUALITY DATE;
run;

proc sort data=LIBREF.TABLE1
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.TABLE2
  by ID;
run;

CategoryRicottone

-  ⇤ ← Revision 1 as of 2022-06-08 20:27:05 → 
  Size: 1382
  Editor: DominicRicottone
  Comment:
+   ← Revision 2 as of 2023-03-30 15:13:50 → ⇥
  Size: 1595
  Editor: DominicRicottone
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= SAS Sort Procedure =
+= SAS Sort =
 Line 3:
-The '''SORT procedure''' in SAS sorts data. It can optionally remove duplicated cases.
+A procedure for sorting and de-duplicating datasets.
 Line 11:
-== Syntax ==
+== Usage ==
 Line 13:
-Sort a table, either in-place or into a new table.
+To sort a dataset and write into a new dataset, try:
 Line 17:
-    by VAR1;
+    by VAR;
 Line 21:
-Deduplicate a table. Produce two tables: unique cases to `TABLE2` and removed (duplicate) cases to `TABLE3`.
+If the '''`out`''' option is not specified, the dataset is sorted in place.



=== NoDupKey ===

To remove duplicate cases by `VAR`, use the '''`nodupkey`''' option.

{{{
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;
}}}



=== DupOut ===

To preserve duplicate cases, specify an alternate output on the '''`dupout`''' option. Removed cases will instead be moved into this dataset.
-Line 25:
+Line 43:
-    by VARLIST;
+    by VAR;
-Line 31:
+Line 49:
-== Examples ==
-Line 33:
+Line 50:
-=== Basic Deduplication ===
+== Deduplication ==

=== Basic ===
-Line 54:
+Line 74:
-=== Deduplication With Criteria ===
+=== Advanced ===
-Line 72:
+Line 92:
-----



== Alternatives ==

Diff for "SAS/Sort"