Size: 4651
Comment:
|
Size: 4202
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
SPSS offers three commands for joining data. | SPSS offers several commands for joining data. The syntax of the commands is first explained, then the equivalent(s) of standard SQL joins are listed. |
Line 13: | Line 15: |
The '''`MATCH FILE`''' command is used for 1:1 joins, where all cases are uniquely identified. === Syntax and Prerequisites === The basic syntax for `MATCH FILE` is: |
The '''`MATCH FILE`''' command is ''primarily'' used for 1:1 joins, where all cases are uniquely identified in all datasets. Try: |
Line 30: | Line 26: |
==== Files ==== Each `/file` subcommand takes one of: |
=== Files === Each `/FILE` subcommand takes one of: |
Line 42: | Line 38: |
==== IDs ==== | === Key Variables === All of the following must be satisfied by the key variable(s) named on the `/BY` subcommand. 1. It must be defined as the same [[SPSS/DataTypes|format]] on all files 2. It must be unique in each file 3. Each file must be pre-sorted In other words, each join could be guarded by: {{{ dataset activate LEFT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. dataset activate RIGHT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. }}} If a key variable is a string, it must additionally be defined as the same length on all files. === Tables === The `MATCH FILE` command has an extension through the '''`/TABLE`''' subcommand. This is useful for appending higher-level variables by way of lookup tables. Try: {{{ match files /file=LEFT /table=LOOKUP /by ID. }}} Each `/TABLE` subcommand takes one of: * the name of a data set * a valid filename or file handle In the case of a `MATCH FILE ... /TABLE` command, the key variable(s) only need to be unique across in each `/TABLE` dataset. In other words, each join could be guarded by: {{{ dataset activate LEFT. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. }}} ---- == Update File == The '''`UPDATE FILE`''' command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended. Try: {{{ update file=MASTER /file=TRANSACTION1 /by ID. }}} === Files === Each `/file` subcommand takes one of: * a star (`*`) indicating the active data set * the name of a data set * a valid filename or file handle The active dataset can be specified on a `/file` subcommand with a star (`*`). That dataset would then be replaced in-place by the join. The first dataset specified on a `/file` subcommand is always the master file. === Key Variables === |
Line 53: | Line 139: |
dataset activate LEFT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. dataset activate RIGHT. |
dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. |
Line 70: | Line 153: |
---- == Star Join == Note: unsupported in SPSS version 20 or earlier. ---- == Joins == |
|
Line 125: | Line 220: |
---- == Match Tables == The `MATCH FILE` command has an extension through the `/table` subcommand. This is useful for appending higher-level variables by way of lookup tables. For example, appending state-level statistics to individual-level data. === Syntax and Prerequisites === The basic syntax is: {{{ match files /file=MASTER /table=LOOKUP1 /by ID. }}} ==== Files ==== Each `/file` and `/table` subcommand takes one of: * the name of a data set * a valid filename or file handle If the active dataset is included in a join and referenced by a star (`*`), that dataset will be replaced in-place by the join. ==== IDs ==== All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. 1. It must be defined as the same type on all files 2. Each file must be pre-sorted A key variable must be unique across all cases in all datasets specified on a `/table` subcommand. In other words, each join could be guarded by: {{{ dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. }}} If the key variable is a string, it must additionally be defined as the same length on all files. ---- == Update File == The '''`UPDATE FILE`''' command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended. === Syntax and Prerequisites === The basic syntax is: {{{ update file=MASTER /file=TRANSACTION1 /by ID. }}} ==== Files ==== Each `/file` subcommand takes one of: * a star (`*`) indicating the active data set * the name of a data set * a valid filename or file handle The active dataset can be specified on a `/file` subcommand with a star (`*`). That dataset would then be replaced in-place by the join. The first dataset specified on a `/file` subcommand is always the master file. ==== IDs ==== All of the following must be satisfied by the key variable(s) named on the `/by` subcommand. 1. It must be defined as the same type on all files 2. It must be unique in each file 3. Each file must be pre-sorted In other words, each join could be guarded by: {{{ dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. }}} If a key variable is a string, it must additionally be defined as the same length on all files. |
SPSS Joining Data
SPSS offers several commands for joining data.
The syntax of the commands is first explained, then the equivalent(s) of standard SQL joins are listed.
Contents
Match Files
The MATCH FILE command is primarily used for 1:1 joins, where all cases are uniquely identified in all datasets. Try:
match files /file=LEFT /file=RIGHT /by ID.
Files
Each /FILE subcommand takes one of:
a star (*) indicating the active data set
- the name of a data set
- a valid filename or file handle
If the active dataset is included in a join and referenced by a star (*), that dataset will be replaced in-place by the join.
Key Variables
All of the following must be satisfied by the key variable(s) named on the /BY subcommand.
It must be defined as the same format on all files
- It must be unique in each file
- Each file must be pre-sorted
In other words, each join could be guarded by:
dataset activate LEFT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute. dataset activate RIGHT. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute.
If a key variable is a string, it must additionally be defined as the same length on all files.
Tables
The MATCH FILE command has an extension through the /TABLE subcommand. This is useful for appending higher-level variables by way of lookup tables. Try:
match files /file=LEFT /table=LOOKUP /by ID.
Each /TABLE subcommand takes one of:
- the name of a data set
- a valid filename or file handle
In the case of a MATCH FILE ... /TABLE command, the key variable(s) only need to be unique across in each /TABLE dataset. In other words, each join could be guarded by:
dataset activate LEFT. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute.
Update File
The UPDATE FILE command is used to overwrite values in a master file with non-null values in transaction files. Cases and variables not present in the master file are appended. Try:
update file=MASTER /file=TRANSACTION1 /by ID.
Files
Each /file subcommand takes one of:
a star (*) indicating the active data set
- the name of a data set
- a valid filename or file handle
The active dataset can be specified on a /file subcommand with a star (*). That dataset would then be replaced in-place by the join.
The first dataset specified on a /file subcommand is always the master file.
Key Variables
All of the following must be satisfied by the key variable(s) named on the /by subcommand.
- It must be defined as the same type on all files
- It must be unique in each file
- Each file must be pre-sorted
In other words, each join could be guarded by:
dataset activate MASTER. sort cases by ID. execute. dataset activate LOOKUP. sort cases by ID. compute dup=0. if (ID=lag(ID)) dup=1. select if dup=0. execute.
If a key variable is a string, it must additionally be defined as the same length on all files.
Star Join
Note: unsupported in SPSS version 20 or earlier.
Joins
Full Join
dataset activate LEFT. match files /file=* /file=RIGHT /by ID. execute.
Left Join
dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1). execute.
Right Join
dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_right=1). execute.
Inner Join
dataset activate LEFT. match files /file=* /in=flag_left /file=RIGHT /in=flag_right /by ID. select if (flag_left=1 and flag_right=1). execute.