Differences between revisions 4 and 13 (spanning 9 versions)

Aggregating Data with SPSS

SPSS offers several commands for computing aggregated statistics and translating datasets into aggregated formats.

Contents

Aggregating Data with SPSS
1. Statistics
2. Wide and Long Data

Statistics

Scratch variables can be used to compute aggregated statistics.

compute #TotalSales = sum(Sales, #TotalSales).
compute Total_Sales = #TotalSales.

The LEAVE command can be used in a similar manner.

compute Total_Sales = sum(Sales, Total_Sales).
leave Total_Sales.

The AGGREGATE command creates a new dataset of aggregated statistics.

aggregate outfile=* mode=addvariables
  /Total_Sales = sum(Sales).

Additionally it allows for group variables on the /BREAK subcommand.

aggregate outfile=* mode=addvariables
  /break=clientid
  /Total_Sales = sum(Sales).

Wide and Long Data

The CASESTOVARS command translates a long dataset into wide format. If the dataset already has an index variable for the within-group sequence, specify it on the /INDEX subcommand.

casestovars
  /id=clientid
  /index=fiscalquarter.

Otherwise variables will be spread into an unknowable number of sequentially-named variables.

If case-wise descriptive statistics are all that is desired from the translation, consider instead using the AGGREGATE command.

dataset declare clients.
aggregate
  /outfile="clients"
  /break=clientid
  /count=N.

The VARSTOCASES command translates a wide dataset into long format.

varstocases
  /make Sales from Sales.1 to Sales.4
  /index=fiscalquarter.

CategoryRicottone

-  ⇤ ← Revision 4 as of 2022-08-22 19:53:29 → 
  Size: 3873
  Editor: DominicRicottone
  Comment:
+   ← Revision 13 as of 2023-06-09 17:52:50 → ⇥
  Size: 1765
  Editor: DominicRicottone
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= Aggregating Data =
+= Aggregating Data with SPSS =

SPSS offers several commands for computing aggregated statistics and translating datasets into aggregated formats.
-Line 9:
+Line 11:
-== Calculating higher-level values ==
+== Statistics ==
-Line 11:
+Line 13:
-=== Scratch Variables ===

'''Scratch variables''' are declared with a preceding hash (`#`). These variables are deleted on reaching execution, and are never re-initialized.

Given sales data, to compute total sales, try:
+[[SPSS/ScratchVariables|Scratch variables]] can be used to compute aggregated statistics.
-Line 20:
+Line 16:
-compute #Total_Sales = sum(Sales, #Total_Sales).
compute Total_Sales = #Total_Sales.
+compute #TotalSales = sum(Sales, #TotalSales).
compute Total_Sales = #TotalSales.
-Line 24:
+Line 20:
-=== Leave ===

The '''`LEAVE`''' command explicitly suppresses re-initialization for a given variable. In all other contexts, these are normal variables with normal behavior.

Given sales data, to compute total sales, try:
+The [[SPSS/Leave|LEAVE]] command can be used in a similar manner.
-Line 37:
+Line 27:
-=== Aggregate ===

The '''`AGGREGATE`''' command ''can'' be used to calculate higher-level values.

Given sales data, to compute total sales, try:
+The [[SPSS/Aggregate|AGGREGATE]] command creates a new dataset of aggregated statistics.
-Line 50:
+Line 34:
-The `OUTFILE=*` and `MODE=ADDVARIABLES` subcommands are the default behavior.
+Additionally it allows for group variables on the `/BREAK` subcommand.
-Line 52:
+Line 36:
-Note: if the calculated variables may collide with pre-existing variable names, use the `OVERWRITE=YES` subcommand.



==== Functions ====

See [[SPSS/AggregateFunctions|Aggregate Functions]].



==== Break Groups ====

To calculate a higher-level value for each subgroup, use the `/BREAK=VARLIST` subcommand. The dataset must be sorted by `VARLIST`.

If the dataset is already sorted, use the `/PRESORTED` subcommand to skip re-processing. The `/PRESORTED` subcommand must precede the `/BREAK` subcommand.
+{{{
aggregate outfile=* mode=addvariables
  /break=clientid
  /Total_Sales = sum(Sales).
}}}
-Line 75:
+Line 46:
-== Aggregating Cases ==
+== Wide and Long Data ==
-Line 77:
+Line 48:
-=== Aggregate ===

The '''`AGGREGATE`''' command ''can'' be used to create a higher-level dataset.

This is mostly only useful with break groups, whereby a case is generated for each one.

Given state-level sales data, to create region-level sales data, try:

{{{
dataset define NEW.
aggregate outfile=NEW
  /break=Region
  /Total_Sales=sum(Sale).
}}}

If the dataset is already sorted, use the `/PRESORTED` subcommand to skip re-processing. The `/PRESORTED` subcommand must precede the `/BREAK` subcommand.



==== Functions ====

See [[SPSS/AggregateFunctions|Aggregate Functions]].



=== Transforming Long to Wide ===

The '''`CASESTOVARS`''' command transforms data from long to wide format. An index variable identifies the cases that should be aggregated.

Given region-level sales data from multiple years, try:

{{{
dataset activate SALES_Y1.
compute Year=1.

dataset activate SALES_Y2.
compute Year=2.

dataset activate SALES_Y3.
compute Year=3.

add files /file=SALES_Y1 /file=SALES_Y2 /file=SALES_Y3.
sort cases by Region.
formats Year (F1).
dataset name SALES.
casestovars
 /id=Region
 /index=Year.
}}}

The ID variable (`Region`) becomes a unique identifier for each row of data. The index variable (`Year`) is dropped, and all other variables are transformed according to the index variable. For example, the `Total_Sales` column becomes a set of columns, one for each year (`Total_Sales.1`, `Total_Sales.2`, `Total_Sales.3`).

The index variables can be either numeric or string, but all cases must have a non-missing value.



==== Constants ====

If a variable does not vary within the new rows of data, they should instead be treated as constants and should not be transformed. Constants should be specified on the `/FIXED` subcommand. SPSS will automatically inspect variables and warn about any that appear to be constants but were not specified as such.

Consider:
+The [[SPSS/CasesToVars|CASESTOVARS]] command translates a long dataset into wide format. If the dataset already has an index variable for the within-group sequence, specify it on the `/INDEX` subcommand.
-Line 143:
+Line 52:
- /id=Region
 /index=Year
 /fixed=Region_Name.
+  /id=clientid
  /index=fiscalquarter.
-Line 148:
+Line 56:
-`Region_Name` would not be duplicated for each index level.
+Otherwise variables will be spread into an unknowable number of sequentially-named variables.
-Line 150:
+Line 58:
+If case-wise descriptive statistics are all that is desired from the translation, consider instead using the [[SPSS/Aggregate|AGGREGATE]] command.
-Line 151:
+Line 60:
+{{{
dataset declare clients.
aggregate
  /outfile="clients"
  /break=clientid
  /count=N.
}}}
-Line 152:
+Line 68:
-=== Transforming Wide to Long ===
+The [[SPSS/VarsToCases|VARSTOCASES]] command translates a wide dataset into long format.
-Line 154:
+Line 70:
-The '''`VARSTOCASES`''' command transforms data from wide to long format.
+{{{
varstocases
  /make Sales from Sales.1 to Sales.4
  /index=fiscalquarter.
}}}

Diff for "SPSS/AggregatingData"

Aggregating Data with SPSS

Statistics

Wide and Long Data