Differences between revisions 1 and 2
Revision 1 as of 2021-11-12 20:49:00
Size: 4643
Comment:
Revision 2 as of 2022-05-27 14:53:06
Size: 4634
Comment:
Deletions are marked like this. Additions are marked like this.
Line 44: Line 44:
All of the following must be satisfied by the key variable(s) named on the `/in` subcommand. All of the following must be satisfied by the key variable(s) named on the `/by` subcommand.
Line 161: Line 161:
All of the following must be satisfied by the key variable(s) named on the `/in` subcommand. All of the following must be satisfied by the key variable(s) named on the `/by` subcommand.
Line 200: Line 200:
match files
  /
file=MASTER
update file=MASTER
Line 223: Line 222:
All of the following must be satisfied by the key variable(s) named on the `/in` subcommand. All of the following must be satisfied by the key variable(s) named on the `/by` subcommand.

SPSS Joining

SPSS offers three commands for joining data.


Match Files

The MATCH FILE command is used for 1:1 joins, where all cases are uniquely identified.

Syntax and Prerequisites

The basic syntax for MATCH FILE is:

match files
  /file=LEFT
  /file=RIGHT
  /by ID.

Files

Each /file subcommand takes one of:

  • a star (*) indicating the active data set

  • the name of a data set
  • a valid filename or file handle

If the active dataset is included in a join and referenced by a star (*), that dataset will be replaced in-place by the join.

IDs

All of the following must be satisfied by the key variable(s) named on the /by subcommand.

  1. It must be defined as the same type on all files
  2. It must be unique in each file
  3. Each file must be pre-sorted

In other words, each join could be guarded by:

dataset activate LEFT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

dataset activate RIGHT.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

If a key variable is a string, it must additionally be defined as the same length on all files.

Full Join

dataset activate LEFT.
match files
  /file=*
  /file=RIGHT
  /by ID.
execute.

Left Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1).
execute.

Right Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_right=1).
execute.

Inner Join

dataset activate LEFT.
match files
  /file=*     /in=flag_left
  /file=RIGHT /in=flag_right
  /by ID.
select if (flag_left=1 and flag_right=1).
execute.


Match Tables

The MATCH FILE command has an extension through the /table subcommand. This is useful for appending higher-level variables by way of lookup tables. For example, appending state-level statistics to individual-level data.

Syntax and Prerequisites

The basic syntax is:

match files
  /file=MASTER
  /table=LOOKUP1
  /by ID.

Files

Each /file and /table subcommand takes one of:

  • the name of a data set
  • a valid filename or file handle

If the active dataset is included in a join and referenced by a star (*), that dataset will be replaced in-place by the join.

IDs

All of the following must be satisfied by the key variable(s) named on the /by subcommand.

  1. It must be defined as the same type on all files
  2. Each file must be pre-sorted

A key variable must be unique across all cases in all datasets specified on a /table subcommand.

In other words, each join could be guarded by:

dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

If the key variable is a string, it must additionally be defined as the same length on all files.


Update File

The UPDATE FILE command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended.

Syntax and Prerequisites

The basic syntax is:

update file=MASTER
  /file=TRANSACTION1
  /by ID.

Files

Each /file subcommand takes one of:

  • a star (*) indicating the active data set

  • the name of a data set
  • a valid filename or file handle

The active dataset can be specified on a /file subcommand with a star (*). That dataset would then be replaced in-place by the join.

The first dataset specified on a /file subcommand is always the master file.

IDs

All of the following must be satisfied by the key variable(s) named on the /by subcommand.

  1. It must be defined as the same type on all files
  2. It must be unique in each file
  3. Each file must be pre-sorted

In other words, each join could be guarded by:

dataset activate MASTER.
sort cases by ID.
execute.

dataset activate LOOKUP.
sort cases by ID.
compute dup=0.
if (ID=lag(ID)) dup=1.
select if dup=0.
execute.

If a key variable is a string, it must additionally be defined as the same length on all files.


CategoryRicottone

SPSS/JoiningData (last edited 2023-06-09 21:03:49 by DominicRicottone)