Differences between revisions 1 and 2
Revision 1 as of 2022-06-08 20:27:05
Size: 1382
Comment:
Revision 2 as of 2023-03-30 15:13:50
Size: 1595
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= SAS Sort Procedure = = SAS Sort =
Line 3: Line 3:
The '''SORT procedure''' in SAS sorts data. It can optionally remove duplicated cases. A procedure for sorting and de-duplicating datasets.
Line 11: Line 11:
== Syntax == == Usage ==
Line 13: Line 13:
Sort a table, either in-place or into a new table. To sort a dataset and write into a new dataset, try:
Line 17: Line 17:
    by VAR1;     by VAR;
Line 21: Line 21:
Deduplicate a table. Produce two tables: unique cases to `TABLE2` and removed (duplicate) cases to `TABLE3`. If the '''`out`''' option is not specified, the dataset is sorted in place.



=== NoDupKey ===

To remove duplicate cases by `VAR`, use the '''`nodupkey`''' option.

{{{
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;
}}}



=== DupOut ===

To preserve duplicate cases, specify an alternate output on the '''`dupout`''' option. Removed cases will instead be moved into this dataset.
Line 25: Line 43:
    by VARLIST;     by VAR;
Line 31: Line 49:
== Examples ==
Line 33: Line 50:
=== Basic Deduplication ===
== Deduplication ==

=== Basic ===
Line 54: Line 74:
=== Deduplication With Criteria === === Advanced ===
Line 72: Line 92:
----



== Alternatives ==

SAS Sort

A procedure for sorting and de-duplicating datasets.


Usage

To sort a dataset and write into a new dataset, try:

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
    by VAR;
run;

If the out option is not specified, the dataset is sorted in place.

NoDupKey

To remove duplicate cases by VAR, use the nodupkey option.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
    by VAR;
run;

DupOut

To preserve duplicate cases, specify an alternate output on the dupout option. Removed cases will instead be moved into this dataset.

proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
    by VAR;
run;


Deduplication

Basic

proc sort data=LIBREF.OLDTABLE
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.REMOVEDCASES
  by ID;
run;

Perfect Duplicates

To remove perfect duplicates...

  1. use _all_ on the BY statement

  2. substitute the nodupkey option with noduprecs

Advanced

If there should be a preference between which case is kept, use another SORT procedure before the one specifying nodupkey.

proc sort data=LIBREF.OLDTABLE
          out=LIBREF.TABLE1;
  by ID descending QUALITY DATE;
run;

proc sort data=LIBREF.TABLE1
          out=LIBREF.NEWTABLE
          nodupkey
          dupout=LIBREF.TABLE2
  by ID;
run;


CategoryRicottone

SAS/Sort (last edited 2023-03-30 20:12:08 by DominicRicottone)