Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2022-06-08 20:27:05
Size: 1382
Comment:
Revision 4 as of 2023-03-30 20:12:08
Size: 1521
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= SAS Sort Procedure = = SAS Sort =
Line 3: Line 3:
The '''SORT procedure''' in SAS sorts data. It can optionally remove duplicated cases. A procedure for sorting and de-duplicating tables.
Line 11: Line 11:
== Syntax == == Usage ==
Line 13: Line 13:
Sort a table, either in-place or into a new table. To sort a table and write data into a new table, try:
Line 17: Line 17:
    by VAR1;     by VAR;
Line 21: Line 21:
Deduplicate a table. Produce two tables: unique cases to `TABLE2` and removed (duplicate) cases to `TABLE3`. If the '''`OUT`''' option is not specified, the dataset is sorted in place.



=== NoDupKey ===

To remove duplicate cases by `VAR`, use the '''`NODUPKEY`''' option.

{{{
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;
}}}



=== DupOut ===

To preserve duplicate cases, specify an alternate output on the '''`DUPOUT`''' option. Removed cases will instead be moved into this dataset.
Line 25: Line 43:
    by VARLIST;     by VAR;
Line 31: Line 49:
== Examples ==
Line 33: Line 50:
=== Basic Deduplication ===
== Deduplication ==

=== Basic ===
Line 36: Line 56:
proc sort data=LIBREF.OLDTABLE
         
out=LIBREF.NEWTABLE
         
nodupkey
         
dupout=LIBREF.REMOVEDCASES
proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES
Line 50: Line 67:
 2. substitute the `nodupkey` option with `noduprecs`  2. substitute the `NODUPKEY` option with `NODUPRECS`
Line 54: Line 71:
=== Deduplication With Criteria === === Advanced ===
Line 56: Line 73:
If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `nodupkey`. If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `NODUPKEY`.
Line 59: Line 76:
proc sort data=LIBREF.OLDTABLE
         
out=LIBREF.TABLE1;
  by ID descending QUALITY DATE;
proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1;
   by ID descending QUALITY DATE;
Line 64: Line 80:
proc sort data=LIBREF.TABLE1
         
out=LIBREF.NEWTABLE
         
nodupkey
         
dupout=LIBREF.TABLE2
  by ID;
proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2
    by ID;
Line 71: Line 84:

----



== Alternatives ==

SAS Sort

A procedure for sorting and de-duplicating tables.


Usage

To sort a table and write data into a new table, try:

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
    by VAR;
run;

If the OUT option is not specified, the dataset is sorted in place.

NoDupKey

To remove duplicate cases by VAR, use the NODUPKEY option.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;

DupOut

To preserve duplicate cases, specify an alternate output on the DUPOUT option. Removed cases will instead be moved into this dataset.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
    by VAR;
run;


Deduplication

Basic

proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES
  by ID;
run;

Perfect Duplicates

To remove perfect duplicates...

  1. use _all_ on the BY statement

  2. substitute the NODUPKEY option with NODUPRECS

Advanced

If there should be a preference between which case is kept, use another SORT procedure before the one specifying NODUPKEY.

proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1;
    by ID descending QUALITY DATE;
run;

proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2
    by ID;
run;


CategoryRicottone

SAS/Sort (last edited 2023-03-30 20:12:08 by DominicRicottone)