Size: 4426
Comment:
|
← Revision 14 as of 2023-06-09 21:03:49 ⇥
Size: 889
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= SPSS Joining Data = | = Joining Data with SPSS = |
Line 3: | Line 3: |
SPSS offers several commands for joining data. The syntax of the commands is first explained, then the equivalent(s) of standard SQL joins are listed. |
SPSS offers several commands for joining datasets. |
Line 13: | Line 11: |
== Match Files == | == Full Join == |
Line 15: | Line 13: |
The '''`MATCH FILE`''' command is ''primarily'' used for 1:1 joins, where all cases are uniquely identified in all datasets. Try: | The primary method for any join in SPSS is the [[SPSS/MatchFiles|MATCH FILES]] command. |
Line 24: | Line 22: |
If the datasets are known to have no overlap, the [[SPSS/AddFiles|ADD FILES]] command is also an option. | |
Line 25: | Line 24: |
=== Files === Each `/FILE` subcommand takes one of: * a star (`*`) indicating the active data set * the name of a data set * a valid filename or file handle If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join. |
---- |
Line 38: | Line 28: |
=== Key Variables === All of the following must be satisfied by the key variable(s) named on the `/BY` subcommand. 1. It must be defined as the same [[SPSS/DataTypes|format]] on all files 2. It must be unique in each file 3. Each file must be pre-sorted In other words, each join could be guarded by: |
== Left Join == |
Line 49: | Line 31: |
dataset activate LEFT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. dataset activate RIGHT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. |
match files /file=LEFT /in=keep /file=RIGHT /by ID. select if keep=1. |
Line 64: | Line 38: |
If a key variable is a string, it must additionally be defined as the same length on all files. | The [[SPSS/StarJoin|STAR JOIN]] command is also available. ---- |
Line 68: | Line 44: |
=== Tables === The `MATCH FILE` command has an extension through the '''`/TABLE`''' subcommand. This is useful for appending higher-level variables by way of lookup tables. Try: |
== Right Join == |
Line 75: | Line 49: |
/table=LOOKUP | /file=RIGHT /in=keep |
Line 77: | Line 51: |
}}} Each `/TABLE` subcommand takes one of: * the name of a data set * a valid filename or file handle In the case of a `MATCH FILE ... /TABLE` command, the key variable(s) only need to be unique across in each `/TABLE` dataset. In other words, each join could be guarded by: {{{ dataset activate LEFT. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. |
select if keep=1. |
Line 103: | Line 58: |
== Update File == The '''`UPDATE FILE`''' command is used to overwrite values in a master file with non-null values in transaction files. Try: |
== Inner Join == |
Line 108: | Line 61: |
update file=LEFT /file=TRANSACTION |
match files /file=LEFT /in=left /file=RIGHT /in=right |
Line 111: | Line 65: |
}}} The final dataset contains all rows and variables from all datasets. Rows and variables originating from a transaction file are appended. === Files === Each `/FILE` subcommand takes one of: * a star (`*`) indicating the active data set * the name of a data set * a valid filename or file handle The active dataset can be specified on a `/FILE` subcommand with a star (`*`). That dataset would then be replaced in-place by the join. The first dataset specified on a `/FILE` subcommand is always the master file. === Key Variables === All of the following must be satisfied by the key variable(s) named on the `/BY` subcommand. 1. It must be defined as the same type on all files 2. It must be unique in each file 3. Each file must be pre-sorted In other words, each join could be guarded by: {{{ dataset activate LEFT. sort cases by ID. execute. dataset activate TRANSACTION. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. }}} If a key variable is a string, it must additionally be defined as the same length on all files. ---- == Star Join == The final dataset contains only the variables specified on the `SELECT` and `JOIN` subcommand, and only the rows originating from the case file. Note: unsupported in SPSS version 20 or earlier. ---- == Joins == === Full Join === {{{ dataset activate LEFT. match files /file=* /file=RIGHT /by ID. execute. }}} === Left Join === {{{ dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1). execute. }}} === Right Join === {{{ dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_right=1). execute. }}} === Inner Join === {{{ dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1 and flag_right=1). execute. |
select if left=1 and right=1. |
Joining Data with SPSS
SPSS offers several commands for joining datasets.
Full Join
The primary method for any join in SPSS is the MATCH FILES command.
match files /file=LEFT /file=RIGHT /by ID.
If the datasets are known to have no overlap, the ADD FILES command is also an option.
Left Join
match files /file=LEFT /in=keep /file=RIGHT /by ID. select if keep=1.
The STAR JOIN command is also available.
Right Join
match files /file=LEFT /file=RIGHT /in=keep /by ID. select if keep=1.
Inner Join
match files /file=LEFT /in=left /file=RIGHT /in=right /by ID. select if left=1 and right=1.