Differences between revisions 1 and 7 (spanning 6 versions)

SPSS Match Files

The MATCH FILES command joins datasets.

Contents

SPSS Match Files

Usage

match files
  /file=left
  /file=right
  /by KEYVARS.

The final dataset contains all rows and columns from all datasets. Variables are taken in order, from datasets in order. If a variable is present in more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set.

The key variables must be defined with the same format in each dataset. Cases must be uniquely identified by the key variables in each dataset, except if using the /TABLE subcommand, in which case this is only required of the dataset specified on the /TABLE subcommand itself.

The requirement for same format includes a requirement for same length of string variables, except in PSPP version 2.0 or later.

Each dataset must be presorted by the key variables.

File

Each /FILE subcommand takes one of:

a star (*) indicating the active data set
the name of a data set
a filename or file handle

If the active dataset is included in a join and referenced by a star (*), that dataset will be modified in-place by the join.

If the active dataset is included in a join and referenced by name, the final dataset will retain the name.

If the active dataset is not included, the final dataset is unnamed and becomes the active dataset.

Table

The MATCH FILES command has an extension through the /TABLE subcommand. It can be used to join lookup tables.

match files
  /file=foo
  /table=states
  /by statecode.

Each /TABLE subcommand takes one of:

the name of a data set
a filename or file handle

In

The /IN subcommand mst immediately follow a /FILE subcommand. It creates a flag variable for that dataset: 1 for any case that is present in it, 0 otherwise.

The flag variable will be non-missing for all cases in the final dataset and will be appended to the end of the variables.

Rename

The /RENAME subcommand applies renames to the /FILE subcommand preceding it.

These renames take place before the datasets are joined. The key variables can be renamed to their final names.

First and Last

The /FIRST and /LAST subcommands append flag variables that mark the first and last matches by the key variables. This is generally only useful with /TABLE joins.

match files
  /file=population /first=headofhousehold
  /table=households
  /by id.

The MATCH FILES command can be used with a single dataset, in which case these subcommands can be used to mark the sequence of cases within a group.

match files
 /file=*
 /by id
 /first=PrimaryFirst
 /last=PrimaryLast.
do if PrimaryFirst=1.
  compute MatchSequence = 1 - PrimaryLast.
else.
  compute MatchSequence = MatchSequence + 1.
end if.
leave MatchSequence.

Keep and Drop

The /KEEP and /DROP subcommands specify a list of variables to keep or drop from the final dataset.

Any variable created by a /IN, /FIRST, or /LAST subcommand cannot be dropped.

Data Model

The MATCH FILES command reads all datasets and data files named on /FILE//TABLE subcommands.

The MATCH FILES command recognizes FILTER status and preserves it, although filtered cases are included in the final dataset.

-  ⇤ ← Revision 1 as of 2023-01-13 23:10:43 → 
  Size: 1744
  Editor: DominicRicottone
  Comment:
+   ← Revision 7 as of 2024-01-02 17:09:03 → ⇥
  Size: 3829
  Editor: DominicRicottone
  Comment: PSPP 2.0 update
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
+The '''`MATCH FILES`''' command joins datasets.
-Line 11:
+Line 13:
-To join two datasets, try:
 Line 15:
-  /file=LEFT
  /file=RIGHT
  /by KEYVARLIST.
+  /file=left
  /file=right
  /by KEYVARS.
 Line 20:
-The final dataset contains all rows and variables from all datasets. Variables are taken in order from the datasets in order. For variables originating from more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set.
+The final dataset contains all rows and columns from all datasets. Variables are taken in order, from datasets in order. If a variable is present in more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set.

The key variables must be defined with the same format in each dataset. Cases must be uniquely identified by the key variables in each dataset, except if using the '''`/TABLE`''' subcommand, in which case this is only required of the dataset specified on the `/TABLE` subcommand itself.

The requirement for same format includes a requirement for same length of string variables, except in PSPP version 2.0 or later.

Each dataset must be presorted by the key variables.
-Line 24:
+Line 30:
-=== File Subcommand ===
+=== File ===
-Line 26:
+Line 32:
-Each `/FILE` subcommand takes one of:
+Each '''`/FILE`''' subcommand takes one of:
-Line 30:
+Line 36:
- * a filename or file handle
+ * a filename or [[SPSS/FileHandle|file handle]]
-Line 32:
+Line 38:
-If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join.
+If the active dataset is included in a join and referenced by a star (`*`), that dataset will be modified in-place by the join.

If the active dataset is included in a join and referenced by name, the final dataset will retain the name.

If the active dataset is not included, the final dataset is unnamed and becomes the active dataset.
-Line 36:
+Line 46:
-=== By Subcommand ===

The `/BY` subcommand specified how cases can be uniquely identified. The `KEYVARLIST` can be one ore more variables.

The folowing are required of key variables:

 * They must be defined and have the same format in each file (including length for string variables)
 * They must uniquely identify a case in each file
 * Each file must be pre-sorted by them

If the `/TABLE` subcommand is used, the key variables specified on the `/BY` subcommand ''only'' need to uniquely identify a case across in each table.



=== Table Subcommand ===
+=== Table ===
-Line 56:
+Line 52:
-  /file=LEFT
  /table=LOOKUP
  /by ID.
+  /file=foo
  /table=states
  /by statecode.
-Line 68:
+Line 64:
+=== In ===

The '''`/IN`''' subcommand mst immediately follow a `/FILE` subcommand. It creates a flag variable for that dataset: 1 for any case that is present in it, 0 otherwise.

The flag variable will be non-missing for all cases in the final dataset and will be appended to the end of the variables.



=== Rename ===

The '''`/RENAME`''' subcommand applies [[SPSS/RenameVariables|renames]] to the `/FILE` subcommand preceding it.

These renames take place before the datasets are joined. The key variables can be renamed to their final names.



=== First and Last ===

The '''`/FIRST`''' and '''`/LAST`''' subcommands append flag variables that mark the first and last matches by the key variables. This is generally only useful with `/TABLE` joins.

{{{
match files
  /file=population /first=headofhousehold
  /table=households
  /by id.
}}}

The `MATCH FILES` command can be used with a single dataset, in which case these subcommands can be used to mark the sequence of cases within a group.

{{{
match files
 /file=*
 /by id
 /first=PrimaryFirst
 /last=PrimaryLast.
do if PrimaryFirst=1.
  compute MatchSequence = 1 - PrimaryLast.
else.
  compute MatchSequence = MatchSequence + 1.
end if.
leave MatchSequence.
}}}



=== Keep and Drop ===

The '''`/KEEP`''' and '''`/DROP`''' subcommands specify a list of variables to keep or drop from the final dataset.

Any variable created by a `/IN`, `/FIRST`, or `/LAST` subcommand cannot be dropped.

----



== Data Model ==

The `MATCH FILES` command reads all datasets and data files named on `/FILE`/`/TABLE` subcommands.

The `MATCH FILES` command recognizes [[SPSS/Filter|FILTER]] status and preserves it, although filtered cases are included in the final dataset.

----



== See also ==

[[https://www.gnu.org/software/pspp/manual/html_node/MATCH-FILES.html|PSPP manual for MATCH FILES]]

Diff for "SPSS/MatchFiles"