|
⇤ ← Revision 1 as of 2022-06-08 20:27:05
Size: 1382
Comment:
|
Size: 1595
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| = SAS Sort Procedure = | = SAS Sort = |
| Line 3: | Line 3: |
| The '''SORT procedure''' in SAS sorts data. It can optionally remove duplicated cases. | A procedure for sorting and de-duplicating datasets. |
| Line 11: | Line 11: |
| == Syntax == | == Usage == |
| Line 13: | Line 13: |
| Sort a table, either in-place or into a new table. | To sort a dataset and write into a new dataset, try: |
| Line 17: | Line 17: |
| by VAR1; | by VAR; |
| Line 21: | Line 21: |
| Deduplicate a table. Produce two tables: unique cases to `TABLE2` and removed (duplicate) cases to `TABLE3`. | If the '''`out`''' option is not specified, the dataset is sorted in place. === NoDupKey === To remove duplicate cases by `VAR`, use the '''`nodupkey`''' option. {{{ proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey; by VAR; run; }}} === DupOut === To preserve duplicate cases, specify an alternate output on the '''`dupout`''' option. Removed cases will instead be moved into this dataset. |
| Line 25: | Line 43: |
| by VARLIST; | by VAR; |
| Line 31: | Line 49: |
| == Examples == | |
| Line 33: | Line 50: |
| === Basic Deduplication === | == Deduplication == === Basic === |
| Line 54: | Line 74: |
| === Deduplication With Criteria === | === Advanced === |
| Line 72: | Line 92: |
| ---- == Alternatives == |
SAS Sort
A procedure for sorting and de-duplicating datasets.
Usage
To sort a dataset and write into a new dataset, try:
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
by VAR;
run;If the out option is not specified, the dataset is sorted in place.
NoDupKey
To remove duplicate cases by VAR, use the nodupkey option.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
by VAR;
run;
DupOut
To preserve duplicate cases, specify an alternate output on the dupout option. Removed cases will instead be moved into this dataset.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
by VAR;
run;
Deduplication
Basic
proc sort data=LIBREF.OLDTABLE
out=LIBREF.NEWTABLE
nodupkey
dupout=LIBREF.REMOVEDCASES
by ID;
run;
Perfect Duplicates
To remove perfect duplicates...
use _all_ on the BY statement
substitute the nodupkey option with noduprecs
Advanced
If there should be a preference between which case is kept, use another SORT procedure before the one specifying nodupkey.
proc sort data=LIBREF.OLDTABLE
out=LIBREF.TABLE1;
by ID descending QUALITY DATE;
run;
proc sort data=LIBREF.TABLE1
out=LIBREF.NEWTABLE
nodupkey
dupout=LIBREF.TABLE2
by ID;
run;