Aggregating Data


Computing Aggregated Variables

Scratch Variables

Scratch variables are declared with a preceding hash (#). These variables are deleted on reaching execution, and are never re-initialized.

compute #total = sum(sale,#total).
compute total = #total.

Leave

The LEAVE command explicitly suppresses re-initialization for a given variable. In all other contexts, these are normal variables with normal behavior.

compute total = sum(sale,total).
leave total.

Aggregate

The AGGREGATE command can either create an aggregated data set or append aggregated variables to the active data set. This latter behavior is a silent default behavior and what will be explored here. For the former, see below.

aggregate outfile=* mode=addvariables
  /last_casenum = max(casenum)
  /total        = sum(sale).

The OUTFILE=* and MODE=ADDVARIABLES subcommands are silent defaults and can be left off. Additionally, use the OVERWRITE=YES subcommand to ignore pre-existing variables.

Note that if the AGGREGATE command is being used in this mode with break groups, cases must be sorted ascending. The command will do this automatically unless the /PRESORTED subcommand is specified.


Aggregating Cases

Aggregate

Transforming Long to Wide

The CASESTOVARS command transforms a data set from long to wide format. An index variable identifies the cases that should be aggregated.

The index variable (or multiple index variables) can be either numeric or string, but all cases must have a non-missing value.

Constants should be specified on the /FIXED subcommand. SPSS will automatically inspect variables and warn about any that appear to be constants but aren't specified here.

The remaining variables will be duplicated for each case that is aggregated together. Extraneous variables can and should be dropped to save time.

casestovars
 /id=ID
 /index=INDEX
 /fixed=INDEX_LEVEL_VARIABLES
 /drop=FOO BAR BAZ.

See a concrete example in the long-to-wide crosswalk.


CategoryRicottone