Differences between revisions 1 and 14 (spanning 13 versions)
Revision 1 as of 2022-08-22 12:09:20
Size: 6538
Comment:
Revision 14 as of 2024-01-02 17:05:51
Size: 5716
Comment: PSPP 2.0 update
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
These are the functions that can be used on the [[SPSS/AggregatingData#Aggregate|AGGREGATE command]]. The [[SPSS/Aggregate|AGGREGATE]] command creates variables using a mini-programming language that is largely characterized by the below functions.
Line 15: Line 15:
Generally, missing values are ignored. [[SPSS/DataTypes#Strings|Strings]] will never be considered missing.

Generally, a missing value are only returned if ''all'' values are missing. The exception is `SD`, which requires two non-missing values.

PSPP extends this syntax with handling for user missing values. To include this type of missing values on any aggregate function, append the function name with a period. For example, `SUM.`.
Line 21: Line 27:
Only the `MAX`, `MIN`, `FIRST`, and `LAST` functions copy the metadata of a source variable. All other created variables lack metadata by default. Only the `MAX`, `MIN`, `FIRST`, and `LAST` functions copy the metadata of a source variable.

All other created variables lack labels and have a format pre-determined by the function.

 * `F5.3` for functions `FGT`, `FIN`, `FLT`, and `FOUT`
 * `F5.1` for functions `PGT`, `PIN`, `PLT`, and `POUT`
 * `F7.0` for functions `NU` and `NUMISS`
 * `F8.2` for functions `CGT`, `CIN`, `CLT`, `COUT`, `MEAN`, `MEDIAN`, `SD`, `SUM`, `N`, and `NMISS`
   * If weighting is not enabled, `N` and `NMISS` create variables with a format of `F7.0`. In other words, `N` and `NMISS` match the behavior of `NU` and `NUMISS` when weighting is disabled.
Line 33: Line 47:
== Cgt ==

The '''`CGT`''' function returns a count of cases with a value greater than a specified second argument for each source variable.

----
== Functions ==
Line 41: Line 51:
== Cin == === Count functions ===
Line 43: Line 53:
The '''`CIN`''' function returns a count of cases with a value within some range for each source variable. The '''`CGT`''' and '''`CLT`''' functions returns a count of cases with a value greater/less than a second argument.
Line 45: Line 55:
The range is specified by the second and third arguments, and it is inclusive of those two values as well. If the second argument is greater than the third, they are automatically reversed. The '''`CIN`''' function returns a count of cases within an inclusive range defined by the second and third arguments. If the second argument is greater than the third, they are automatically reversed. If they are equal, `CIN` operates as an equality.
Line 47: Line 57:
If the second and third arguments are equal, `CIN` returns a count of cases with a value equal to the second argument. The '''`COUT`''' function is the complement of `CIN`.
Line 49: Line 59:
---- Note: These are all unsupported in SPSS version 21 or earlier, and in PSPP version 1.6.2 or earlier.
Line 53: Line 63:
== Clt == === First ===
Line 55: Line 65:
The '''`CLT`''' function returns a count of cases with a value lesser than a specified second argument for each source variable.

----
The '''`FIRST`''' function returns the first non-missing value for each source variable. String values will never be considered missing.
Line 61: Line 69:
== Cout == === Fraction functions ===
Line 63: Line 71:
The '''`COUT`''' function returns a count of cases with a value outside of some range for each source variable. The '''`FGT`''' and '''`FLT`''' functions returns a fraction of cases with a value greater/less than a second argument.
Line 65: Line 73:
The range is specified by the second and third arguments, and it is inclusive of those two values as well. If the second argument is greater than the third, they are automatically reversed. The '''`FIN`''' function returns a fraction of cases within an inclusive range defined by the second and third arguments. If the second argument is greater than the third, they are automatically reversed. If they are equal, `FIN` operates as an equality.
Line 67: Line 75:
If the second and third arguments are equal, `COUT` returns a count of cases with a value not equal to the second argument.

`COUT` is complementary of `CIN`.

----
The '''`FOUT`''' function is the complement of `FIN`.
Line 75: Line 79:
== Fgt == === Last ===
Line 77: Line 81:
The '''`FGT`''' function returns a fraction of cases with a value greater than a specified second argument for each source variable.

----
The '''`LAST`''' function returns the lastnon-missing value for each source variable. String values will never be considered missing.
Line 83: Line 85:
== Fin == === Max ===
Line 85: Line 87:
The '''`FIN`''' function returns a fraction of cases with a value within some range for each source variable. The '''`MAX`''' function returns a maximum non-missing value for each source variable.
Line 87: Line 89:
The range is specified by the second and third arguments, and it is inclusive of those two values as well. If the second argument is greater than the third, they are automatically reversed.

If the second and third arguments are equal, `FIN` returns a fraction of cases with a value equal to the second argument.

----
String values are evaluated according to codepoints. For example, `"Z"` has a higher codepoint than `"A"`, so between the two values the maximum value is `"Z"`. String values will never be considered missing.
Line 95: Line 93:
== First ==

The '''`FIRST`''' function returns the first non-missing value in a break group.

TODO: what happens if specified with a variable list argument?

----



== Flt ==

The '''`FLT`''' function returns a fraction of cases with a value lesser than a specified second argument for each source variable.

----



== Fout ==

The '''`FOUT`''' function returns a percentage of cases with a value outside of some range for each source variable.

The range is specified by the second and third arguments, and it is inclusive of those two values as well. If the second argument is greater than the third, they are automatically reversed.

If the second and third arguments are equal, `FOUT` returns a fraction of cases with a value not equal to the second argument.

`FOUT` is complementary of `FIN`.

----



== Last ==

The '''`LAST`''' function returns the last non-missing value in a break group.

TODO: what happens if specified with a variable list argument?

----



== Mean ==
=== Mean ===
Line 143: Line 99:
----

=== Median ===

The '''`MEDIAN`''' function returns a median value for each source variable.

Note: only valid for numeric variables.
Line 147: Line 109:
== Max == === Min ===
Line 149: Line 111:
The '''`MAX`''' function returns a maximum value for each source variable. The '''`MIN`''' function returns a minimum non-missing value for each source variable.
Line 151: Line 113:
---- String values are evaluated according to codepoints. For example, `"A"` has a lower codepoint than `"Z"`, so between the two values the minimum value is `"A"`. String values will never be considered missing.
Line 155: Line 117:
== Median == === N ===
Line 157: Line 119:
The '''`MEDIAN`''' function returns a median value for each source variable. The '''`N`''' function returns a weighted number of cases in a break group.
Line 159: Line 121:
---- If specified with a variable list argument, the `N` function returns a weighted number of cases with non-missing values for each source variable. String values will never be considered missing.
Line 163: Line 125:
== Min == === Nmiss ===
Line 165: Line 127:
The '''`MIN`''' function returns a minimum value for each source variable.

----
The '''`NMISS`''' function returns a weighted number of cases with missing values for each source variable. String values will never be considered missing.
Line 171: Line 131:
== N == === Nu ===
Line 173: Line 133:
The '''`N`''' function returns a weighted number of cases in a break group. The '''`NU`''' function returns an unweighted number of cases in a break group.
Line 175: Line 135:
If specified with a variable list argument, the `N` function returns a weighted number of cases with non-missing values for each source variable.

----
If specified with a variable list argument, the `NU` function returns an unweighted number of cases with non-missing values for each source variable. String values will never be considered missing.
Line 181: Line 139:
== Nmiss == === Numiss ===
Line 183: Line 141:
The '''`NMISS`''' function returns a weighted number of missing cases in a break group.

TODO: what happ
ens if specified with a variable list argument?

----
The '''`NUMISS`''' function returns an unweighted number of cases with missing values for each source variable. String values will never be considered missing.
Line 191: Line 145:
== Nu == === Percentage functions ===
Line 193: Line 147:
The '''`NU`''' function returns an unweighted number of cases in a break group. The '''`PGT`''' and '''`PLT`''' functions returns a percentage of cases with a value greater/less than a second argument.
Line 195: Line 149:
If specified with a variable list argument, the `NU` function returns an unweighted number of cases with non-missing values for each source variable. The '''`PIN`''' function returns a percentage of cases with a value within some range for each source variable. If the second argument is greater than the third, they are automatically reversed. If they are equal, `PIN` operates as an equality.
Line 197: Line 151:
---- The '''`POUT`''' function is the complement of `PIN`.
Line 201: Line 155:
== Numiss ==

The '''`NUMISS`''' function returns an unweighted number of missing cases in a break group.

TODO: what happens if specified with a variable list argument?

----



== Pgt ==

The '''`PGT`''' function returns a percentage of cases with a value greater than a specified second argument for each source variable.

----



== Pin ==

The '''`PIN`''' function returns a percentage of cases with a value within some range for each source variable.

The range is specified by the second and third arguments, and it is inclusive of those two values as well. If the second argument is greater than the third, they are automatically reversed.

If the second and third arguments are equal, `PIN` returns a percentage of cases with a value equal to the second argument.

----



== Plt ==

The '''`PLT`''' function returns a percentage of cases with a value lesser than a specified second argument for each source variable.

----



== Pout ==

The '''`POUT`''' function returns a percentage of cases with a value outside of some range for each source variable.

The range is specified by the second and third arguments, and it is inclusive of those two values as well. If the second argument is greater than the third, they are automatically reversed.

If the second and third arguments are equal, `POUT` returns a percentage of cases with a value not equal to the second argument.

`POUT` is complementary of `PIN`.

----



== SD ==
=== SD ===
Line 259: Line 161:
----
Line 262: Line 163:

== Sum ==
=== Sum ===

SPSS Aggregate Functions

The AGGREGATE command creates variables using a mini-programming language that is largely characterized by the below functions.


General Syntax

The number of target variables must match the number of source variables.

Generally, missing values are ignored. Strings will never be considered missing.

Generally, a missing value are only returned if all values are missing. The exception is SD, which requires two non-missing values.

PSPP extends this syntax with handling for user missing values. To include this type of missing values on any aggregate function, append the function name with a period. For example, SUM..


Variable Metadata

Only the MAX, MIN, FIRST, and LAST functions copy the metadata of a source variable.

All other created variables lack labels and have a format pre-determined by the function.

  • F5.3 for functions FGT, FIN, FLT, and FOUT

  • F5.1 for functions PGT, PIN, PLT, and POUT

  • F7.0 for functions NU and NUMISS

  • F8.2 for functions CGT, CIN, CLT, COUT, MEAN, MEDIAN, SD, SUM, N, and NMISS

    • If weighting is not enabled, N and NMISS create variables with a format of F7.0. In other words, N and NMISS match the behavior of NU and NUMISS when weighting is disabled.

To specify a variable label for a new target variable, list the label in quotes following the new variable name.

Value labels cannot be specified.

Variable formats cannot be specified.


Functions

Count functions

The CGT and CLT functions returns a count of cases with a value greater/less than a second argument.

The CIN function returns a count of cases within an inclusive range defined by the second and third arguments. If the second argument is greater than the third, they are automatically reversed. If they are equal, CIN operates as an equality.

The COUT function is the complement of CIN.

Note: These are all unsupported in SPSS version 21 or earlier, and in PSPP version 1.6.2 or earlier.

First

The FIRST function returns the first non-missing value for each source variable. String values will never be considered missing.

Fraction functions

The FGT and FLT functions returns a fraction of cases with a value greater/less than a second argument.

The FIN function returns a fraction of cases within an inclusive range defined by the second and third arguments. If the second argument is greater than the third, they are automatically reversed. If they are equal, FIN operates as an equality.

The FOUT function is the complement of FIN.

Last

The LAST function returns the lastnon-missing value for each source variable. String values will never be considered missing.

Max

The MAX function returns a maximum non-missing value for each source variable.

String values are evaluated according to codepoints. For example, "Z" has a higher codepoint than "A", so between the two values the maximum value is "Z". String values will never be considered missing.

Mean

The MEAN function returns a mean across cases for each source variable.

Note: only valid for numeric variables.

Median

The MEDIAN function returns a median value for each source variable.

Note: only valid for numeric variables.

Min

The MIN function returns a minimum non-missing value for each source variable.

String values are evaluated according to codepoints. For example, "A" has a lower codepoint than "Z", so between the two values the minimum value is "A". String values will never be considered missing.

N

The N function returns a weighted number of cases in a break group.

If specified with a variable list argument, the N function returns a weighted number of cases with non-missing values for each source variable. String values will never be considered missing.

Nmiss

The NMISS function returns a weighted number of cases with missing values for each source variable. String values will never be considered missing.

Nu

The NU function returns an unweighted number of cases in a break group.

If specified with a variable list argument, the NU function returns an unweighted number of cases with non-missing values for each source variable. String values will never be considered missing.

Numiss

The NUMISS function returns an unweighted number of cases with missing values for each source variable. String values will never be considered missing.

Percentage functions

The PGT and PLT functions returns a percentage of cases with a value greater/less than a second argument.

The PIN function returns a percentage of cases with a value within some range for each source variable. If the second argument is greater than the third, they are automatically reversed. If they are equal, PIN operates as an equality.

The POUT function is the complement of PIN.

SD

The SD function returns a standard deviation across cases for each source variable.

Note: only valid for numeric variables.

Sum

The SUM function returns a sum across cases for each source variable.

Note: only valid for numeric variables.


CategoryRicottone

SPSS/Aggregate/Functions (last edited 2024-01-02 17:05:51 by DominicRicottone)