Differences between revisions 2 and 13 (spanning 11 versions)

Aggregating Data with SPSS

SPSS offers several commands for computing aggregated statistics and translating datasets into aggregated formats.

Contents

Aggregating Data with SPSS
1. Statistics
2. Wide and Long Data

Statistics

Scratch variables can be used to compute aggregated statistics.

compute #TotalSales = sum(Sales, #TotalSales).
compute Total_Sales = #TotalSales.

The LEAVE command can be used in a similar manner.

compute Total_Sales = sum(Sales, Total_Sales).
leave Total_Sales.

The AGGREGATE command creates a new dataset of aggregated statistics.

aggregate outfile=* mode=addvariables
  /Total_Sales = sum(Sales).

Additionally it allows for group variables on the /BREAK subcommand.

aggregate outfile=* mode=addvariables
  /break=clientid
  /Total_Sales = sum(Sales).

Wide and Long Data

The CASESTOVARS command translates a long dataset into wide format. If the dataset already has an index variable for the within-group sequence, specify it on the /INDEX subcommand.

casestovars
  /id=clientid
  /index=fiscalquarter.

Otherwise variables will be spread into an unknowable number of sequentially-named variables.

If case-wise descriptive statistics are all that is desired from the translation, consider instead using the AGGREGATE command.

dataset declare clients.
aggregate
  /outfile="clients"
  /break=clientid
  /count=N.

The VARSTOCASES command translates a wide dataset into long format.

varstocases
  /make Sales from Sales.1 to Sales.4
  /index=fiscalquarter.

CategoryRicottone

-  ⇤ ← Revision 2 as of 2022-08-22 12:08:43 → 
  Size: 2973
  Editor: DominicRicottone
  Comment:
+   ← Revision 13 as of 2023-06-09 17:52:50 → ⇥
  Size: 1765
  Editor: DominicRicottone
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= Aggregating Data =
+= Aggregating Data with SPSS =

SPSS offers several commands for computing aggregated statistics and translating datasets into aggregated formats.
-Line 9:
+Line 11:
-== Calculating higher-level values ==
+== Statistics ==
-Line 11:
+Line 13:
-=== Scratch Variables ===

'''Scratch variables''' are declared with a preceding hash (`#`). These variables are deleted on reaching execution, and are never re-initialized.
+[[SPSS/ScratchVariables|Scratch variables]] can be used to compute aggregated statistics.
-Line 18:
+Line 16:
-compute #total = sum(sale,#total).
compute total = #total.
+compute #TotalSales = sum(Sales, #TotalSales).
compute Total_Sales = #TotalSales.
-Line 22:
+Line 20:
-=== Leave ===

The '''`LEAVE`''' command explicitly suppresses re-initialization for a given variable. In all other contexts, these are normal variables with normal behavior.
+The [[SPSS/Leave|LEAVE]] command can be used in a similar manner.
-Line 29:
+Line 23:
-compute total = sum(sale,total).
leave total.
+compute Total_Sales = sum(Sales, Total_Sales).
leave Total_Sales.
-Line 33:
+Line 27:
-=== Aggregate ===

The '''`AGGREGATE`''' command ''can'' be used to calculate higher-level values.

To calculate the age of the oldest customer from a dataset of sales, try:
+The [[SPSS/Aggregate|AGGREGATE]] command creates a new dataset of aggregated statistics.
-Line 43:
+Line 31:
-  /oldest_customer_age = max(age).
+  /Total_Sales = sum(Sales).
-Line 46:
+Line 34:
-The `OUTFILE=*` and `MODE=ADDVARIABLES` subcommands are the default behavior.
+Additionally it allows for group variables on the `/BREAK` subcommand.
-Line 48:
+Line 36:
-Note: if the calculated variables may collide with pre-existing variable names, use the `OVERWRITE=YES` subcommand.



==== Functions ====

See [[SPSS/AggregateFunctions|Aggregate Functions]].



==== Break Groups ====

To calculate a higher-level value for each subgroup, use the `/BREAK=VARLIST` subcommand. The dataset must be sorted by `VARLIST`.

If the dataset is already sorted, use the `/PRESORTED` subcommand to skip re-processing. The `/PRESORTED` subcommand must precede the `/BREAK` subcommand.
+{{{
aggregate outfile=* mode=addvariables
  /break=clientid
  /Total_Sales = sum(Sales).
}}}
-Line 71:
+Line 46:
-== Aggregating Cases ==
+== Wide and Long Data ==
-Line 73:
+Line 48:
-=== Aggregate ===

The '''`AGGREGATE`''' command ''can'' be used to create a higher-level dataset.

This is mostly only useful with break groups, whereby a case is generated for each one.

To create a dataset with a row for each region, try:

{{{
dataset define NEW.
aggregate outfile=NEW
  /break=region
  /total_sales=sum(sale).
}}}

If the dataset is already sorted, use the `/PRESORTED` subcommand to skip re-processing. The `/PRESORTED` subcommand must precede the `/BREAK` subcommand.


==== Functions ====

See [[SPSS/AggregateFunctions|Aggregate Functions]].



=== Transforming Long to Wide ===

The `CASESTOVARS` command transforms a data set from long to wide format. An index variable identifies the cases that should be aggregated.

The index variable (or multiple index variables) can be either numeric or string, but all cases must have a non-missing value.

Constants should be specified on the `/FIXED` subcommand. SPSS will automatically inspect variables and warn about any that appear to be constants but aren't specified here.

The remaining variables will be duplicated for each case that is aggregated together. Extraneous variables can and should be dropped to save time.
+The [[SPSS/CasesToVars|CASESTOVARS]] command translates a long dataset into wide format. If the dataset already has an index variable for the within-group sequence, specify it on the `/INDEX` subcommand.
-Line 111:
+Line 52:
- /id=ID
 /index=INDEX
 /fixed=INDEX_LEVEL_VARIABLES
 /drop=FOO BAR BAZ.
+  /id=clientid
  /index=fiscalquarter.
-Line 117:
+Line 56:
-See a concrete example in the [[CrosswalkLongToWide#SPSS|long-to-wide crosswalk]].
+Otherwise variables will be spread into an unknowable number of sequentially-named variables.

If case-wise descriptive statistics are all that is desired from the translation, consider instead using the [[SPSS/Aggregate|AGGREGATE]] command.

{{{
dataset declare clients.
aggregate
  /outfile="clients"
  /break=clientid
  /count=N.
}}}

The [[SPSS/VarsToCases|VARSTOCASES]] command translates a wide dataset into long format.

{{{
varstocases
  /make Sales from Sales.1 to Sales.4
  /index=fiscalquarter.
}}}

Diff for "SPSS/AggregatingData"

Aggregating Data with SPSS

Statistics

Wide and Long Data