Size: 4651
Comment:
|
← Revision 14 as of 2023-06-09 21:03:49 ⇥
Size: 889
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= SPSS Joining Data = | = Joining Data with SPSS = |
Line 3: | Line 3: |
SPSS offers three commands for joining data. | SPSS offers several commands for joining datasets. |
Line 11: | Line 11: |
== Match Files == | == Full Join == |
Line 13: | Line 13: |
The '''`MATCH FILE`''' command is used for 1:1 joins, where all cases are uniquely identified. === Syntax and Prerequisites === The basic syntax for `MATCH FILE` is: |
The primary method for any join in SPSS is the [[SPSS/MatchFiles|MATCH FILES]] command. |
Line 28: | Line 22: |
If the datasets are known to have no overlap, the [[SPSS/AddFiles|ADD FILES]] command is also an option. | |
Line 29: | Line 24: |
==== Files ==== Each `/file` subcommand takes one of: * a star (`*`) indicating the active data set * the name of a data set * a valid filename or file handle If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join. |
---- |
Line 42: | Line 28: |
==== IDs ==== All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. 1. It must be defined as the same type on all files 2. It must be unique in each file 3. Each file must be pre-sorted In other words, each join could be guarded by: |
== Left Join == |
Line 53: | Line 31: |
dataset activate LEFT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. dataset activate RIGHT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. |
match files /file=LEFT /in=keep /file=RIGHT /by ID. select if keep=1. |
Line 68: | Line 38: |
If a key variable is a string, it must additionally be defined as the same length on all files. | The [[SPSS/StarJoin|STAR JOIN]] command is also available. ---- |
Line 72: | Line 44: |
=== Full Join === | == Right Join == |
Line 75: | Line 47: |
dataset activate LEFT. | |
Line 77: | Line 48: |
/file=* /file=RIGHT |
/file=LEFT /file=RIGHT /in=keep |
Line 80: | Line 51: |
execute. }}} === Left Join === {{{ dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1). execute. }}} === Right Join === {{{ dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_right=1). execute. }}} === Inner Join === {{{ dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1 and flag_right=1). execute. |
select if keep=1. |
Line 129: | Line 58: |
== Match Tables == The `MATCH FILE` command has an extension through the `/table` subcommand. This is useful for appending higher-level variables by way of lookup tables. For example, appending state-level statistics to individual-level data. === Syntax and Prerequisites === The basic syntax is: |
== Inner Join == |
Line 141: | Line 62: |
/file=MASTER /table=LOOKUP1 |
/file=LEFT /in=left /file=RIGHT /in=right |
Line 144: | Line 65: |
select if left=1 and right=1. | |
Line 145: | Line 67: |
==== Files ==== Each `/file` and `/table` subcommand takes one of: * the name of a data set * a valid filename or file handle If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join. ==== IDs ==== All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. 1. It must be defined as the same type on all files 2. Each file must be pre-sorted A key variable must be unique across all cases in all datasets specified on a `/table` subcommand. In other words, each join could be guarded by: {{{ dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. }}} If the key variable is a string, it must additionally be defined as the same length on all files. ---- == Update File == The '''`UPDATE FILE`''' command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended. === Syntax and Prerequisites === The basic syntax is: {{{ update file=MASTER /file=TRANSACTION1 /by ID. }}} ==== Files ==== Each `/file` subcommand takes one of: * a star (`*`) indicating the active data set * the name of a data set * a valid filename or file handle The active dataset can be specified on a `/file` subcommand with a star (`*`). That dataset would then be replaced in-place by the join. The first dataset specified on a `/file` subcommand is always the master file. ==== IDs ==== All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. 1. It must be defined as the same type on all files 2. It must be unique in each file 3. Each file must be pre-sorted In other words, each join could be guarded by: {{{ dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. }}} If a key variable is a string, it must additionally be defined as the same length on all files. |
Joining Data with SPSS
SPSS offers several commands for joining datasets.
Full Join
The primary method for any join in SPSS is the MATCH FILES command.
match files /file=LEFT /file=RIGHT /by ID.
If the datasets are known to have no overlap, the ADD FILES command is also an option.
Left Join
match files /file=LEFT /in=keep /file=RIGHT /by ID. select if keep=1.
The STAR JOIN command is also available.
Right Join
match files /file=LEFT /file=RIGHT /in=keep /by ID. select if keep=1.
Inner Join
match files /file=LEFT /in=left /file=RIGHT /in=right /by ID. select if left=1 and right=1.