⇤ ← Revision 1 as of 2021-12-29 17:47:37
Size: 2275
Comment:
|
Size: 2973
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 9: | Line 9: |
== Computing Aggregated Variables == | == Calculating higher-level values == |
Line 26: | Line 26: |
The `LEAVE` command explicitly suppresses re-initialization for a given variable. In all other contexts, these are normal variables with normal behavior. | The '''`LEAVE`''' command explicitly suppresses re-initialization for a given variable. In all other contexts, these are normal variables with normal behavior. |
Line 37: | Line 37: |
The `AGGREGATE` command can either create an aggregated data set or append aggregated variables to the active data set. This latter behavior is a silent default behavior and what will be explored here. For the former, see below. | The '''`AGGREGATE`''' command ''can'' be used to calculate higher-level values. To calculate the age of the oldest customer from a dataset of sales, try: |
Line 41: | Line 43: |
/last_casenum = max(casenum) /total = sum(sale). |
/oldest_customer_age = max(age). |
Line 45: | Line 46: |
The `OUTFILE=*` and `MODE=ADDVARIABLES` subcommands are silent defaults and can be left off. Additionally, use the `OVERWRITE=YES` subcommand to ignore pre-existing variables. | The `OUTFILE=*` and `MODE=ADDVARIABLES` subcommands are the default behavior. |
Line 47: | Line 48: |
Note that if the `AGGREGATE` command is being used in this mode with break groups, '''cases must be sorted ascending'''. The command will do this automatically unless the `/PRESORTED` subcommand is specified. | Note: if the calculated variables may collide with pre-existing variable names, use the `OVERWRITE=YES` subcommand. ==== Functions ==== See [[SPSS/AggregateFunctions|Aggregate Functions]]. ==== Break Groups ==== To calculate a higher-level value for each subgroup, use the `/BREAK=VARLIST` subcommand. The dataset must be sorted by `VARLIST`. If the dataset is already sorted, use the `/PRESORTED` subcommand to skip re-processing. The `/PRESORTED` subcommand must precede the `/BREAK` subcommand. |
Line 58: | Line 76: |
The '''`AGGREGATE`''' command ''can'' be used to create a higher-level dataset. This is mostly only useful with break groups, whereby a case is generated for each one. To create a dataset with a row for each region, try: {{{ dataset define NEW. aggregate outfile=NEW /break=region /total_sales=sum(sale). }}} If the dataset is already sorted, use the `/PRESORTED` subcommand to skip re-processing. The `/PRESORTED` subcommand must precede the `/BREAK` subcommand. ==== Functions ==== See [[SPSS/AggregateFunctions|Aggregate Functions]]. |
Aggregating Data
Contents
Calculating higher-level values
Scratch Variables
Scratch variables are declared with a preceding hash (#). These variables are deleted on reaching execution, and are never re-initialized.
compute #total = sum(sale,#total). compute total = #total.
Leave
The LEAVE command explicitly suppresses re-initialization for a given variable. In all other contexts, these are normal variables with normal behavior.
compute total = sum(sale,total). leave total.
Aggregate
The AGGREGATE command can be used to calculate higher-level values.
To calculate the age of the oldest customer from a dataset of sales, try:
aggregate outfile=* mode=addvariables /oldest_customer_age = max(age).
The OUTFILE=* and MODE=ADDVARIABLES subcommands are the default behavior.
Note: if the calculated variables may collide with pre-existing variable names, use the OVERWRITE=YES subcommand.
Functions
See Aggregate Functions.
Break Groups
To calculate a higher-level value for each subgroup, use the /BREAK=VARLIST subcommand. The dataset must be sorted by VARLIST.
If the dataset is already sorted, use the /PRESORTED subcommand to skip re-processing. The /PRESORTED subcommand must precede the /BREAK subcommand.
Aggregating Cases
Aggregate
The AGGREGATE command can be used to create a higher-level dataset.
This is mostly only useful with break groups, whereby a case is generated for each one.
To create a dataset with a row for each region, try:
dataset define NEW. aggregate outfile=NEW /break=region /total_sales=sum(sale).
If the dataset is already sorted, use the /PRESORTED subcommand to skip re-processing. The /PRESORTED subcommand must precede the /BREAK subcommand.
Functions
See Aggregate Functions.
Transforming Long to Wide
The CASESTOVARS command transforms a data set from long to wide format. An index variable identifies the cases that should be aggregated.
The index variable (or multiple index variables) can be either numeric or string, but all cases must have a non-missing value.
Constants should be specified on the /FIXED subcommand. SPSS will automatically inspect variables and warn about any that appear to be constants but aren't specified here.
The remaining variables will be duplicated for each case that is aggregated together. Extraneous variables can and should be dropped to save time.
casestovars /id=ID /index=INDEX /fixed=INDEX_LEVEL_VARIABLES /drop=FOO BAR BAZ.
See a concrete example in the long-to-wide crosswalk.