Differences between revisions 2 and 4 (spanning 2 versions)
Revision 2 as of 2022-05-27 14:53:06
Size: 4634
Comment:
Revision 4 as of 2022-08-21 21:06:03
Size: 4685
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from SPSS/Joining
Line 13: Line 14:
The `MATCH FILE` command is used for 1:1 joins, where all cases are uniquely identified. The '''`MATCH FILE`''' command is used for 1:1 joins, where all cases are uniquely identified.
Line 191: Line 192:
The `UPDATE FILE` command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended. The '''`UPDATE FILE`''' command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended.

SPSS Joining

SPSS offers three commands for joining data.


Match Files

The MATCH FILE command is used for 1:1 joins, where all cases are uniquely identified.

Syntax and Prerequisites

The basic syntax for MATCH FILE is:

match files
  /file=LEFT
  /file=RIGHT
  /by ID.

Files

Each /file subcommand takes one of:

  • a star (*) indicating the active data set

  • the name of a data set
  • a valid filename or file handle

If the active dataset is included in a join and referenced by a star (*), that dataset will be replaced in-place by the join.

IDs

All of the following must be satisfied by the key variable(s) named on the /by subcommand.

  1. It must be defined as the same type on all files
  2. It must be unique in each file
  3. Each file must be pre-sorted

In other words, each join could be guarded by:

dataset activate LEFT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

dataset activate RIGHT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

If a key variable is a string, it must additionally be defined as the same length on all files.

Full Join

dataset activate LEFT.
match files
  /file=*
  /file=RIGHT
  /by ID.
execute.

Left Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1).
execute.

Right Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_right=1).
execute.

Inner Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1 and flag_right=1).
execute.


Match Tables

The MATCH FILE command has an extension through the /table subcommand. This is useful for appending higher-level variables by way of lookup tables. For example, appending state-level statistics to individual-level data.

Syntax and Prerequisites

The basic syntax is:

match files
  /file=MASTER
  /table=LOOKUP1
  /by ID.

Files

Each /file and /table subcommand takes one of:

  • the name of a data set
  • a valid filename or file handle

If the active dataset is included in a join and referenced by a star (*), that dataset will be replaced in-place by the join.

IDs

All of the following must be satisfied by the key variable(s) named on the /by subcommand.

  1. It must be defined as the same type on all files
  2. Each file must be pre-sorted

A key variable must be unique across all cases in all datasets specified on a /table subcommand.

In other words, each join could be guarded by:

dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

If the key variable is a string, it must additionally be defined as the same length on all files.


Update File

The UPDATE FILE command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended.

Syntax and Prerequisites

The basic syntax is:

update file=MASTER
  /file=TRANSACTION1
  /by ID.

Files

Each /file subcommand takes one of:

  • a star (*) indicating the active data set

  • the name of a data set
  • a valid filename or file handle

The active dataset can be specified on a /file subcommand with a star (*). That dataset would then be replaced in-place by the join.

The first dataset specified on a /file subcommand is always the master file.

IDs

All of the following must be satisfied by the key variable(s) named on the /by subcommand.

  1. It must be defined as the same type on all files
  2. It must be unique in each file
  3. Each file must be pre-sorted

In other words, each join could be guarded by:

dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

If a key variable is a string, it must additionally be defined as the same length on all files.


CategoryRicottone

SPSS/JoiningData (last edited 2023-06-09 21:03:49 by DominicRicottone)