⇤ ← Revision 1 as of 2021-11-12 20:49:00
Size: 4643
Comment:
|
Size: 4634
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 44: | Line 44: |
All of the following must be satisfied by the key variable(s) named on the `/in` subcommand. | All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. |
Line 161: | Line 161: |
All of the following must be satisfied by the key variable(s) named on the `/in` subcommand. | All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. |
Line 200: | Line 200: |
match files /file=MASTER |
update file=MASTER |
Line 223: | Line 222: |
All of the following must be satisfied by the key variable(s) named on the `/in` subcommand. | All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. |
SPSS Joining
SPSS offers three commands for joining data.
Contents
Match Files
The MATCH FILE command is used for 1:1 joins, where all cases are uniquely identified.
Syntax and Prerequisites
The basic syntax for MATCH FILE is:
match files /file=LEFT /file=RIGHT /by ID.
Files
Each /file subcommand takes one of:
a star (*) indicating the active data set
- the name of a data set
- a valid filename or file handle
If the active dataset is included in a join and referenced by a star (*), that dataset will be replaced in-place by the join.
IDs
All of the following must be satisfied by the key variable(s) named on the /by subcommand.
- It must be defined as the same type on all files
- It must be unique in each file
- Each file must be pre-sorted
In other words, each join could be guarded by:
dataset activate LEFT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. dataset activate RIGHT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute.
If a key variable is a string, it must additionally be defined as the same length on all files.
Full Join
dataset activate LEFT. match files /file=* /file=RIGHT /by ID. execute.
Left Join
dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1). execute.
Right Join
dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_right=1). execute.
Inner Join
dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1 and flag_right=1). execute.
Match Tables
The MATCH FILE command has an extension through the /table subcommand. This is useful for appending higher-level variables by way of lookup tables. For example, appending state-level statistics to individual-level data.
Syntax and Prerequisites
The basic syntax is:
match files /file=MASTER /table=LOOKUP1 /by ID.
Files
Each /file and /table subcommand takes one of:
- the name of a data set
- a valid filename or file handle
If the active dataset is included in a join and referenced by a star (*), that dataset will be replaced in-place by the join.
IDs
All of the following must be satisfied by the key variable(s) named on the /by subcommand.
- It must be defined as the same type on all files
- Each file must be pre-sorted
A key variable must be unique across all cases in all datasets specified on a /table subcommand.
In other words, each join could be guarded by:
dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute.
If the key variable is a string, it must additionally be defined as the same length on all files.
Update File
The UPDATE FILE command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended.
Syntax and Prerequisites
The basic syntax is:
update file=MASTER /file=TRANSACTION1 /by ID.
Files
Each /file subcommand takes one of:
a star (*) indicating the active data set
- the name of a data set
- a valid filename or file handle
The active dataset can be specified on a /file subcommand with a star (*). That dataset would then be replaced in-place by the join.
The first dataset specified on a /file subcommand is always the master file.
IDs
All of the following must be satisfied by the key variable(s) named on the /by subcommand.
- It must be defined as the same type on all files
- It must be unique in each file
- Each file must be pre-sorted
In other words, each join could be guarded by:
dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute.
If a key variable is a string, it must additionally be defined as the same length on all files.