Joining Data with SPSS

Firstly the SPSS commands are listed with an explanation of syntax. Secondly the equivalents of standard SQL joins are listed.


Match Files

The MATCH FILE command is primarily used for 1:1 joins, where all cases are uniquely identified in all datasets. Try:

match files
  /file=LEFT
  /file=RIGHT
  /by ID.

The final dataset contains all rows and variables from all datasets. Variables are taken in order from the datasets in order. For variables originating from more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set.

For more details on the MATCH FILES command, see here.


Update File

The UPDATE FILE command is used to overwrite values in a master file with non-null values in transaction files. Try:

update file=LEFT
  /file=TRANSACTION
  /by ID.

The final dataset contains all rows and variables from all datasets. Rows and variables originating from a transaction file are appended.

For more details on the UPDATE FILE command, see here.


Star Join

The final dataset contains only the variables specified on the SELECT and JOIN subcommand, and only the rows originating from the case file.

Note: unsupported in SPSS version 20 or earlier.


Joins

Full Join

dataset activate LEFT.
match files
  /file=*
  /file=RIGHT
  /by ID.
execute.

Left Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1).
execute.

Right Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_right=1).
execute.

Inner Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1 and flag_right=1).
execute.


CategoryRicottone