Size: 1382
Comment:
|
← Revision 4 as of 2023-03-30 20:12:08 ⇥
Size: 1521
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= SAS Sort Procedure = | = SAS Sort = |
Line 3: | Line 3: |
The '''SORT procedure''' in SAS sorts data. It can optionally remove duplicated cases. | A procedure for sorting and de-duplicating tables. |
Line 11: | Line 11: |
== Syntax == | == Usage == |
Line 13: | Line 13: |
Sort a table, either in-place or into a new table. | To sort a table and write data into a new table, try: |
Line 17: | Line 17: |
by VAR1; | by VAR; |
Line 21: | Line 21: |
Deduplicate a table. Produce two tables: unique cases to `TABLE2` and removed (duplicate) cases to `TABLE3`. | If the '''`OUT`''' option is not specified, the dataset is sorted in place. === NoDupKey === To remove duplicate cases by `VAR`, use the '''`NODUPKEY`''' option. {{{ proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey; by VAR; run; }}} === DupOut === To preserve duplicate cases, specify an alternate output on the '''`DUPOUT`''' option. Removed cases will instead be moved into this dataset. |
Line 25: | Line 43: |
by VARLIST; | by VAR; |
Line 31: | Line 49: |
== Examples == | |
Line 33: | Line 50: |
=== Basic Deduplication === | == Deduplication == === Basic === |
Line 36: | Line 56: |
proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES |
proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES |
Line 50: | Line 67: |
2. substitute the `nodupkey` option with `noduprecs` | 2. substitute the `NODUPKEY` option with `NODUPRECS` |
Line 54: | Line 71: |
=== Deduplication With Criteria === | === Advanced === |
Line 56: | Line 73: |
If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `nodupkey`. | If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `NODUPKEY`. |
Line 59: | Line 76: |
proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1; by ID descending QUALITY DATE; |
proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1; by ID descending QUALITY DATE; |
Line 64: | Line 80: |
proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2 by ID; |
proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2 by ID; |
Line 71: | Line 84: |
---- == Alternatives == |
SAS Sort
A procedure for sorting and de-duplicating tables.
Usage
To sort a table and write data into a new table, try:
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2; by VAR; run;
If the OUT option is not specified, the dataset is sorted in place.
NoDupKey
To remove duplicate cases by VAR, use the NODUPKEY option.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey; by VAR; run;
DupOut
To preserve duplicate cases, specify an alternate output on the DUPOUT option. Removed cases will instead be moved into this dataset.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3; by VAR; run;
Deduplication
Basic
proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES by ID; run;
Perfect Duplicates
To remove perfect duplicates...
use _all_ on the BY statement
substitute the NODUPKEY option with NODUPRECS
Advanced
If there should be a preference between which case is kept, use another SORT procedure before the one specifying NODUPKEY.
proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1; by ID descending QUALITY DATE; run; proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2 by ID; run;