Size: 2275
Comment:
|
← Revision 14 as of 2023-06-11 20:45:57 ⇥
Size: 1885
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Aggregating Data = | = Aggregating Data with SPSS = SPSS offers several commands for computing aggregated statistics and translating datasets into aggregated formats. |
Line 9: | Line 11: |
== Computing Aggregated Variables == | == Statistics == |
Line 11: | Line 13: |
=== Scratch Variables === '''Scratch variables''' are declared with a preceding hash (`#`). These variables are deleted on reaching execution, and are never re-initialized. |
[[SPSS/ScratchVariables|Scratch variables]] can be used to compute aggregated statistics. |
Line 18: | Line 16: |
compute #total = sum(sale,#total). compute total = #total. |
compute #TotalSales = sum(Sales, #TotalSales). compute Total_Sales = #TotalSales. |
Line 22: | Line 20: |
=== Leave === The `LEAVE` command explicitly suppresses re-initialization for a given variable. In all other contexts, these are normal variables with normal behavior. |
The [[SPSS/Leave|LEAVE]] command can be used in a similar manner. |
Line 29: | Line 23: |
compute total = sum(sale,total). leave total. |
compute Total_Sales = sum(Sales, Total_Sales). leave Total_Sales. |
Line 33: | Line 27: |
=== Aggregate === The `AGGREGATE` command can either create an aggregated data set or append aggregated variables to the active data set. This latter behavior is a silent default behavior and what will be explored here. For the former, see below. |
The [[SPSS/Aggregate|AGGREGATE]] command creates a new dataset of aggregated statistics. |
Line 41: | Line 31: |
/last_casenum = max(casenum) /total = sum(sale). |
/Total_Sales = sum(Sales). |
Line 45: | Line 34: |
The `OUTFILE=*` and `MODE=ADDVARIABLES` subcommands are silent defaults and can be left off. Additionally, use the `OVERWRITE=YES` subcommand to ignore pre-existing variables. | Additionally it allows for group variables on the `/BREAK` subcommand. |
Line 47: | Line 36: |
Note that if the `AGGREGATE` command is being used in this mode with break groups, '''cases must be sorted ascending'''. The command will do this automatically unless the `/PRESORTED` subcommand is specified. | {{{ aggregate outfile=* mode=addvariables /break=clientid /Total_Sales = sum(Sales). }}} |
Line 53: | Line 46: |
== Aggregating Cases == | == Wide and Long Data == The [[SPSS/CasesToVars|CASESTOVARS]] command translates a long dataset into wide format. If the dataset already has an index variable for the within-group sequence, specify it on the `/INDEX` subcommand. {{{ casestovars /id=clientid /index=fiscalquarter. }}} Otherwise variables will be spread into an unknowable number of sequentially-named variables. If case-wise descriptive statistics are all that is desired from the translation, consider instead using the [[SPSS/Aggregate|AGGREGATE]] command. {{{ dataset declare clients. aggregate /outfile="clients" /break=clientid /count=N. }}} The [[SPSS/VarsToCases|VARSTOCASES]] command translates a wide dataset into long format. {{{ varstocases /make Sales from Sales.1 to Sales.4 /index=fiscalquarter. }}} ---- |
Line 57: | Line 80: |
=== Aggregate === | == Data Model == |
Line 59: | Line 82: |
=== Transforming Long to Wide === The `CASESTOVARS` command transforms a data set from long to wide format. An index variable identifies the cases that should be aggregated. The index variable (or multiple index variables) can be either numeric or string, but all cases must have a non-missing value. Constants should be specified on the `/FIXED` subcommand. SPSS will automatically inspect variables and warn about any that appear to be constants but aren't specified here. The remaining variables will be duplicated for each case that is aggregated together. Extraneous variables can and should be dropped to save time. {{{ casestovars /id=ID /index=INDEX /fixed=INDEX_LEVEL_VARIABLES /drop=FOO BAR BAZ. }}} See a concrete example in the [[CrosswalkLongToWide#SPSS|long-to-wide crosswalk]]. |
The `AGGREGATE` command does ''not'' recognize [[SPSS/SplitFile|SPLIT FILE]] status. |
Aggregating Data with SPSS
SPSS offers several commands for computing aggregated statistics and translating datasets into aggregated formats.
Statistics
Scratch variables can be used to compute aggregated statistics.
compute #TotalSales = sum(Sales, #TotalSales). compute Total_Sales = #TotalSales.
The LEAVE command can be used in a similar manner.
compute Total_Sales = sum(Sales, Total_Sales). leave Total_Sales.
The AGGREGATE command creates a new dataset of aggregated statistics.
aggregate outfile=* mode=addvariables /Total_Sales = sum(Sales).
Additionally it allows for group variables on the /BREAK subcommand.
aggregate outfile=* mode=addvariables /break=clientid /Total_Sales = sum(Sales).
Wide and Long Data
The CASESTOVARS command translates a long dataset into wide format. If the dataset already has an index variable for the within-group sequence, specify it on the /INDEX subcommand.
casestovars /id=clientid /index=fiscalquarter.
Otherwise variables will be spread into an unknowable number of sequentially-named variables.
If case-wise descriptive statistics are all that is desired from the translation, consider instead using the AGGREGATE command.
dataset declare clients. aggregate /outfile="clients" /break=clientid /count=N.
The VARSTOCASES command translates a wide dataset into long format.
varstocases /make Sales from Sales.1 to Sales.4 /index=fiscalquarter.
Data Model
The AGGREGATE command does not recognize SPLIT FILE status.