Differences between revisions 1 and 13 (spanning 12 versions)

Joining Data with SPSS

Firstly the SPSS commands are listed with an explanation of syntax. Secondly the equivalents of standard SQL joins are listed.

Contents

Joining Data with SPSS

Match Files

The MATCH FILE command is primarily used for 1:1 joins, where all cases are uniquely identified in all datasets. Try:

match files
  /file=LEFT
  /file=RIGHT
  /by ID.

The final dataset contains all rows and variables from all datasets. Variables are taken in order from the datasets in order. For variables originating from more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set.

For more details on the MATCH FILES command, see here.

Update File

The UPDATE FILE command is used to overwrite values in a master file with non-null values in transaction files. Try:

update file=LEFT
  /file=TRANSACTION
  /by ID.

The final dataset contains all rows and variables from all datasets. Rows and variables originating from a transaction file are appended.

For more details on the UPDATE FILE command, see here.

Star Join

The final dataset contains only the variables specified on the SELECT and JOIN subcommand, and only the rows originating from the case file.

Note: unsupported in SPSS version 20 or earlier.

Joins

Full Join

dataset activate LEFT.
match files
  /file=*
  /file=RIGHT
  /by ID.
execute.

Left Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1).
execute.

Right Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_right=1).
execute.

Inner Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1 and flag_right=1).
execute.

CategoryRicottone

-  ⇤ ← Revision 1 as of 2021-11-12 20:49:00 → 
  Size: 4643
  Editor: DominicRicottone
  Comment:
+   ← Revision 13 as of 2023-01-13 23:16:47 → ⇥
  Size: 2263
  Editor: DominicRicottone
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= SPSS Joining =
+= Joining Data with SPSS =
 Line 3:
-SPSS offers three commands for joining data.
+Firstly the SPSS commands are listed with an explanation of syntax. Secondly the equivalents of standard SQL joins are listed.
 Line 13:
-The `MATCH FILE` command is used for 1:1 joins, where all cases are uniquely identified.



=== Syntax and Prerequisites ===

The basic syntax for `MATCH FILE` is:
+The '''`MATCH FILE`''' command is ''primarily'' used for 1:1 joins, where all cases are uniquely identified in all datasets. Try:
-Line 28:
+Line 22:
+The final dataset contains all rows and variables from all datasets. Variables are taken in order from the datasets in order. For variables originating from more than one dataset, values are taken from the first dataset they appear in and metadata is taken from the first dataset with any (i.e. variable label, value labels, or missing values) metadata set.
-Line 29:
+Line 24:
+For more details on the `MATCH FILES` command, see [[SPSS/MatchFiles|here]].
-Line 30:
+Line 26:
-==== Files ====

Each `/file` subcommand takes one of:

 * a star (`*`) indicating the active data set
 * the name of a data set
 * a valid filename or file handle

If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join.
+----
-Line 42:
+Line 30:
-==== IDs ====
+== Update File ==
-Line 44:
+Line 32:
-All of the following must be satisfied by the key variable(s) named on the `/in` subcommand.

 1. It must be defined as the same type on all files
 2. It must be unique in each file
 3. Each file must be pre-sorted

In other words, each join could be guarded by:
+The '''`UPDATE FILE`''' command is used to overwrite values in a master file with non-null values in transaction files. Try:
-Line 53:
+Line 35:
-dataset activate LEFT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

dataset activate RIGHT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.
+update file=LEFT
  /file=TRANSACTION
  /by ID.
-Line 68:
+Line 40:
-If a key variable is a string, it must additionally be defined as the same length on all files.
+The final dataset contains all rows and variables from all datasets. Rows and variables originating from a transaction file are appended.

For more details on the `UPDATE FILE` command, see [[SPSS/UpdateFile|here]].

----
-Line 71:
+Line 47:
+== Star Join ==

The final dataset contains only the variables specified on the `SELECT` and `JOIN` subcommand, and only the rows originating from the case file.

Note: unsupported in SPSS version 20 or earlier.

----



== Joins ==
-Line 125:
+Line 113:
-----



== Match Tables ==

The `MATCH FILE` command has an extension through the `/table` subcommand. This is useful for appending higher-level variables by way of lookup tables. For example, appending state-level statistics to individual-level data.



=== Syntax and Prerequisites ===

The basic syntax is:

{{{
match files
  /file=MASTER
  /table=LOOKUP1
  /by ID.
}}}



==== Files ====

Each `/file` and `/table` subcommand takes one of:

 * the name of a data set
 * a valid filename or file handle

If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join.



==== IDs ====

All of the following must be satisfied by the key variable(s) named on the `/in` subcommand.

 1. It must be defined as the same type on all files
 2. Each file must be pre-sorted

A key variable must be unique across all cases in all datasets specified on a `/table` subcommand.

In other words, each join could be guarded by:

{{{
dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.
}}}

If the key variable is a string, it must additionally be defined as the same length on all files.

----



== Update File ==

The `UPDATE FILE` command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended.



=== Syntax and Prerequisites ===

The basic syntax is:

{{{
match files
  /file=MASTER
  /file=TRANSACTION1
  /by ID.
}}}



==== Files ====

Each `/file` subcommand takes one of:

 * a star (`*`) indicating the active data set
 * the name of a data set
 * a valid filename or file handle

The active dataset can be specified on a `/file` subcommand with a star (`*`). That dataset would then be replaced in-place by the join.

The first dataset specified on a `/file` subcommand is always the master file.


==== IDs ====

All of the following must be satisfied by the key variable(s) named on the `/in` subcommand.

 1. It must be defined as the same type on all files
 2. It must be unique in each file
 3. Each file must be pre-sorted

In other words, each join could be guarded by:

{{{
dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.
}}}

If a key variable is a string, it must additionally be defined as the same length on all files.

Diff for "SPSS/JoiningData"