|
Size: 1595
Comment:
|
Size: 1520
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 3: | Line 3: |
| A procedure for sorting and de-duplicating datasets. | A procedure for sorting and de-duplicating tables. |
| Line 21: | Line 21: |
| If the '''`out`''' option is not specified, the dataset is sorted in place. | If the '''`OUT`''' option is not specified, the dataset is sorted in place. |
| Line 27: | Line 27: |
| To remove duplicate cases by `VAR`, use the '''`nodupkey`''' option. | To remove duplicate cases by `VAR`, use the '''`NODUPKEY`''' option. |
| Line 39: | Line 39: |
| To preserve duplicate cases, specify an alternate output on the '''`dupout`''' option. Removed cases will instead be moved into this dataset. | To preserve duplicate cases, specify an alternate output on the '''`DUPOUT`''' option. Removed cases will instead be moved into this dataset. |
| Line 56: | Line 56: |
| proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES |
proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES |
| Line 70: | Line 67: |
| 2. substitute the `nodupkey` option with `noduprecs` | 2. substitute the `NODUPKEY` option with `NODUPRECS` |
| Line 76: | Line 73: |
| If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `nodupkey`. | If there should be a preference between which case is kept, use another `SORT` procedure ''before'' the one specifying `NODUPKEY`. |
| Line 79: | Line 76: |
| proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1; by ID descending QUALITY DATE; |
proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1; by ID descending QUALITY DATE; |
| Line 84: | Line 80: |
| proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2 by ID; |
proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2 by ID; |
SAS Sort
A procedure for sorting and de-duplicating tables.
Usage
To sort a dataset and write into a new dataset, try:
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2;
by VAR;
run;If the OUT option is not specified, the dataset is sorted in place.
NoDupKey
To remove duplicate cases by VAR, use the NODUPKEY option.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey;
by VAR;
run;
DupOut
To preserve duplicate cases, specify an alternate output on the DUPOUT option. Removed cases will instead be moved into this dataset.
proc sort data=LIBREF.TABLE1 out=LIBREF.TABLE2 nodupkey dupout=LIBREF.TABLE3;
by VAR;
run;
Deduplication
Basic
proc sort data=LIBREF.OLDTABLE out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.REMOVEDCASES by ID; run;
Perfect Duplicates
To remove perfect duplicates...
use _all_ on the BY statement
substitute the NODUPKEY option with NODUPRECS
Advanced
If there should be a preference between which case is kept, use another SORT procedure before the one specifying NODUPKEY.
proc sort data=LIBREF.OLDTABLE out=LIBREF.TABLE1;
by ID descending QUALITY DATE;
run;
proc sort data=LIBREF.TABLE1 out=LIBREF.NEWTABLE nodupkey dupout=LIBREF.TABLE2
by ID;
run;