Differences between revisions 5 and 14 (spanning 9 versions)

Joining Data with SPSS

SPSS offers several commands for joining datasets.

Contents

Joining Data with SPSS

Full Join

The primary method for any join in SPSS is the MATCH FILES command.

match files
  /file=LEFT
  /file=RIGHT
  /by ID.

If the datasets are known to have no overlap, the ADD FILES command is also an option.

Left Join

match files
  /file=LEFT /in=keep
  /file=RIGHT
  /by ID.
select if keep=1.

The STAR JOIN command is also available.

Right Join

match files
  /file=LEFT
  /file=RIGHT /in=keep
  /by ID.
select if keep=1.

Inner Join

match files
  /file=LEFT /in=left
  /file=RIGHT /in=right
  /by ID.
select if left=1 and right=1.

CategoryRicottone

-  ⇤ ← Revision 5 as of 2022-08-21 21:06:10 → 
  Size: 4651
  Editor: DominicRicottone
  Comment:
+   ← Revision 14 as of 2023-06-09 21:03:49 → ⇥
  Size: 889
  Editor: DominicRicottone
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= SPSS Joining Data =
+= Joining Data with SPSS =
 Line 3:
-SPSS offers three commands for joining data.
+SPSS offers several commands for joining datasets.
 Line 11:
-== Match Files ==
+== Full Join ==
 Line 13:
-The '''`MATCH FILE`''' command is used for 1:1 joins, where all cases are uniquely identified.



=== Syntax and Prerequisites ===

The basic syntax for `MATCH FILE` is:
+The primary method for any join in SPSS is the [[SPSS/MatchFiles|MATCH FILES]] command.
-Line 28:
+Line 22:
+If the datasets are known to have no overlap, the [[SPSS/AddFiles|ADD FILES]] command is also an option.
-Line 29:
+Line 24:
-==== Files ====

Each `/file` subcommand takes one of:

 * a star (`*`) indicating the active data set
 * the name of a data set
 * a valid filename or file handle

If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join.
+----
-Line 42:
+Line 28:
-==== IDs ====

All of the following must be satisfied by the key variable(s) named on the `/by` subcommand.

 1. It must be defined as the same type on all files
 2. It must be unique in each file
 3. Each file must be pre-sorted

In other words, each join could be guarded by:
+== Left Join ==
-Line 53:
+Line 31:
-dataset activate LEFT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

dataset activate RIGHT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.
+match files
  /file=LEFT /in=keep
  /file=RIGHT
  /by ID.
select if keep=1.
-Line 68:
+Line 38:
-If a key variable is a string, it must additionally be defined as the same length on all files.
+The [[SPSS/StarJoin|STAR JOIN]] command is also available.

----
-Line 72:
+Line 44:
-=== Full Join ===
+== Right Join ==
-Line 75:
+Line 47:
-dataset activate LEFT.
-Line 77:
+Line 48:
-  /file=*
  /file=RIGHT
+  /file=LEFT
  /file=RIGHT /in=keep
-Line 80:
+Line 51:
-execute.
}}}



=== Left Join ===

{{{
dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1).
execute.
}}}



=== Right Join ===

{{{
dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_right=1).
execute.
}}}



=== Inner Join ===

{{{
dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1 and flag_right=1).
execute.
+select if keep=1.
-Line 129:
+Line 58:
-== Match Tables ==

The `MATCH FILE` command has an extension through the `/table` subcommand. This is useful for appending higher-level variables by way of lookup tables. For example, appending state-level statistics to individual-level data.



=== Syntax and Prerequisites ===

The basic syntax is:
+== Inner Join ==
-Line 141:
+Line 62:
-  /file=MASTER
  /table=LOOKUP1
+  /file=LEFT /in=left
  /file=RIGHT /in=right
-Line 144:
+Line 65:
+select if left=1 and right=1.
-Line 145:
+Line 67:
-==== Files ====

Each `/file` and `/table` subcommand takes one of:

 * the name of a data set
 * a valid filename or file handle

If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join.



==== IDs ====

All of the following must be satisfied by the key variable(s) named on the `/by` subcommand.

 1. It must be defined as the same type on all files
 2. Each file must be pre-sorted

A key variable must be unique across all cases in all datasets specified on a `/table` subcommand.

In other words, each join could be guarded by:

{{{
dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.
}}}

If the key variable is a string, it must additionally be defined as the same length on all files.

----



== Update File ==

The '''`UPDATE FILE`''' command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended.



=== Syntax and Prerequisites ===

The basic syntax is:

{{{
update file=MASTER
  /file=TRANSACTION1
  /by ID.
}}}



==== Files ====

Each `/file` subcommand takes one of:

 * a star (`*`) indicating the active data set
 * the name of a data set
 * a valid filename or file handle

The active dataset can be specified on a `/file` subcommand with a star (`*`). That dataset would then be replaced in-place by the join.

The first dataset specified on a `/file` subcommand is always the master file.


==== IDs ====

All of the following must be satisfied by the key variable(s) named on the `/by` subcommand.

 1. It must be defined as the same type on all files
 2. It must be unique in each file
 3. Each file must be pre-sorted

In other words, each join could be guarded by:

{{{
dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.
}}}

If a key variable is a string, it must additionally be defined as the same length on all files.

Diff for "SPSS/JoiningData"

Joining Data with SPSS

Full Join

Left Join

Right Join

Inner Join