= SPSS Match Files = The '''`MATCH FILES`''' command joins datasets. <> ---- == Usage == {{{ match files /file=left /file=right /by KEYVARS. }}} The final dataset contains all rows and columns from all datasets. Variables are taken in order, from datasets in order. If a variable is present in more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set. The key variables must be defined with the same format in each dataset. Cases must be uniquely identified by the key variables in each dataset, except if using the '''`/TABLE`''' subcommand, in which case this is only required of the dataset specified on the `/TABLE` subcommand itself. The requirement for same format includes a requirement for same length of string variables, except in PSPP version 2.0 or later. Each dataset must be presorted by the key variables. === File === Each '''`/FILE`''' subcommand takes one of: * a star (`*`) indicating the active data set * the name of a data set * a filename or [[SPSS/FileHandle|file handle]] If the active dataset is included in a join and referenced by a star (`*`), that dataset will be modified in-place by the join. If the active dataset is included in a join and referenced by name, the final dataset will retain the name. If the active dataset is not included, the final dataset is unnamed and becomes the active dataset. === Table === The `MATCH FILES` command has an extension through the `/TABLE` subcommand. It can be used to join lookup tables. {{{ match files /file=foo /table=states /by statecode. }}} Each `/TABLE` subcommand takes one of: * the name of a data set * a filename or file handle === In === The '''`/IN`''' subcommand mst immediately follow a `/FILE` subcommand. It creates a flag variable for that dataset: 1 for any case that is present in it, 0 otherwise. The flag variable will be non-missing for all cases in the final dataset and will be appended to the end of the variables. === Rename === The '''`/RENAME`''' subcommand applies [[SPSS/RenameVariables|renames]] to the `/FILE` subcommand preceding it. These renames take place before the datasets are joined. The key variables can be renamed to their final names. === First and Last === The '''`/FIRST`''' and '''`/LAST`''' subcommands append flag variables that mark the first and last matches by the key variables. This is generally only useful with `/TABLE` joins. {{{ match files /file=population /first=headofhousehold /table=households /by id. }}} The `MATCH FILES` command can be used with a single dataset, in which case these subcommands can be used to mark the sequence of cases within a group. {{{ match files /file=* /by id /first=PrimaryFirst /last=PrimaryLast. do if PrimaryFirst=1. compute MatchSequence = 1 - PrimaryLast. else. compute MatchSequence = MatchSequence + 1. end if. leave MatchSequence. }}} === Keep and Drop === The '''`/KEEP`''' and '''`/DROP`''' subcommands specify a list of variables to keep or drop from the final dataset. Any variable created by a `/IN`, `/FIRST`, or `/LAST` subcommand cannot be dropped. ---- == Data Model == The `MATCH FILES` command reads all datasets and data files named on `/FILE`/`/TABLE` subcommands. The `MATCH FILES` command recognizes [[SPSS/Filter|FILTER]] status and preserves it, although filtered cases are included in the final dataset. ---- == See also == [[https://www.gnu.org/software/pspp/manual/html_node/MATCH-FILES.html|PSPP manual for MATCH FILES]] ---- CategoryRicottone