SPSS Match Files
The MATCH FILES command joins datasets.
Contents
Usage
match files /file=left /file=right /by KEYVARS.
The final dataset contains all rows and columns from all datasets. Variables are taken in order, from datasets in order. If a variable is present in more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set.
The key variables must be defined with the same format in each dataset. Cases must be uniquely identified by the key variables in each dataset, except if using the /TABLE subcommand, in which case this is only required of the dataset specified on the /TABLE subcommand itself.
The requirement for same format includes a requirement for same length of string variables, except in PSPP version 2.0 or later.
Each dataset must be presorted by the key variables.
File
Each /FILE subcommand takes one of:
a star (*) indicating the active data set
- the name of a data set
a filename or file handle
If the active dataset is included in a join and referenced by a star (*), that dataset will be modified in-place by the join.
If the active dataset is included in a join and referenced by name, the final dataset will retain the name.
If the active dataset is not included, the final dataset is unnamed and becomes the active dataset.
Table
The MATCH FILES command has an extension through the /TABLE subcommand. It can be used to join lookup tables.
match files /file=foo /table=states /by statecode.
Each /TABLE subcommand takes one of:
- the name of a data set
- a filename or file handle
In
The /IN subcommand mst immediately follow a /FILE subcommand. It creates a flag variable for that dataset: 1 for any case that is present in it, 0 otherwise.
The flag variable will be non-missing for all cases in the final dataset and will be appended to the end of the variables.
Rename
The /RENAME subcommand applies renames to the /FILE subcommand preceding it.
These renames take place before the datasets are joined. The key variables can be renamed to their final names.
First and Last
The /FIRST and /LAST subcommands append flag variables that mark the first and last matches by the key variables. This is generally only useful with /TABLE joins.
match files /file=population /first=headofhousehold /table=households /by id.
The MATCH FILES command can be used with a single dataset, in which case these subcommands can be used to mark the sequence of cases within a group.
match files /file=* /by id /first=PrimaryFirst /last=PrimaryLast. do if PrimaryFirst=1. compute MatchSequence = 1 - PrimaryLast. else. compute MatchSequence = MatchSequence + 1. end if. leave MatchSequence.
Keep and Drop
The /KEEP and /DROP subcommands specify a list of variables to keep or drop from the final dataset.
Any variable created by a /IN, /FIRST, or /LAST subcommand cannot be dropped.
Data Model
The MATCH FILES command reads all datasets and data files named on /FILE//TABLE subcommands.
The MATCH FILES command recognizes FILTER status and preserves it, although filtered cases are included in the final dataset.