Size: 4095
Comment:
|
← Revision 7 as of 2023-06-14 19:30:43 ⇥
Size: 2153
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
SPSS has a wide variety of commands for importing, parsing, or otherwise reading in data. | SPSS has a wide variety of commands for reading in external or embedded data. |
Line 11: | Line 11: |
== SPSS Formats == | == Embedded Data == |
Line 13: | Line 13: |
SPSS has an internal format syntax which is used throughout data import steps. Further details are available [[SPSS/Types|here]]. For the purposes of this page, all that needs to be understood is: * `A20` is a 20-wide string variable * `F8` is a 8-wide numeric variable * `F8.2` is a 8-wide numeric variable with 2 decimal points, i.e. `12345.78` ---- == Columnar Definition == The columnar definition of a fixed-width variable consists of: * the name * for a 1-wide variable: the column index * for a 2+-wide variable: the start and end column indices, separated by a dash (`-`) * the variable format within parentheses If a variable format is not specified, the basic numeric format (`F`) is assumed. Note that `GET DATA` does not fully comply to this standard: * `GET DATA` counts columns starting at 0, not 1 * variables must have start and end column indices on a `GET DATA` command, so 1-wide variables will be specified like `1-1` === Decimals === A numeric variable's columnar definition can have a decimal place indicated with the variable format. A survey weight could be defined as `final_wt 1-8 (F, 5)` and would be imported as `F8.5`. Furthermore, because the numeric format is the default, this ''can'' be shortened to `(5)`. This isn't necessarily recommended though. === Strings === A string variable's columnar definition would look like `Name 1-24 (A)`. However, a major caveat is that columnar indices are byte-wise. In other words, Unicode data will be treated as discrete bytes rather than characters. ---- == FORTRAN Definition == ---- == Data List == The `DATA LIST` command is used to read in arbitrary data. If the data is stored in an external file, reference it on a `/file` subcommand. If the data is entered in the syntax, it must be bounded by `BEGIN DATA` and `END DATA` statements. === Free === The `FREE` subcommand causes data to be read into rows and columns irrespective of record delimiters. {{{ data list free / CaseNum (F2) Score (F3). begin data 1, 10, 2, 40, 3, 15, 4, 10, 5, 15, 6,, 7, 25, 8, 10 end data. }}} This command results in the following dataset: ||'''!CaseNum'''||'''Score'''|| ||1||10|| ||2||40|| ||3||15|| ||4||10|| ||5||15|| ||6|||| ||7||25|| ||8||10|| If formats were not specified, these variables would be read in using the default (`F8.2`). === List === The `LIST` subcommand operates much the same as the `FREE` subcommand except that record delimiters matter. This is an equivalent syntax to the above example. |
The [[SPSS/DataList|DATA LIST]] command can be used to read data that is listed between '''`BEGIN DATA`''' and '''`END DATA`''' commands. |
Line 124: | Line 29: |
=== Fixed === The `FIXED` subcommand causes data to be read according to fixed-width columnar formats. This is an equivalent syntax to the above example. {{{ data list fixed / CaseNum 1-2 Score 4-6. begin data 1 10 2 40 3 15 4 10 5 15 6 7 25 8 10 end data. }}} Note that columns are numbered starting at 1 for `DATA LIST`, whereas `GET DATA` starts at 0. |
---- |
Line 150: | Line 33: |
==== Multi-record data ==== | == Fixed Width Data Files == |
Line 152: | Line 35: |
When cases are spread across multiple records, it is possible to read them in as a single row. | The `GET` command can be used to read fixed-width data contained in an external file. |
Line 155: | Line 38: |
data list fixed record=2 /1 CaseNum 1-2 Score 4-6 /2 Time 1-4. 1 10 1200 2 40 0800 3 15 1600 |
get data /type=txt /file="path/to/data.txt" /arrangement=fixed /fixcase=1 /firstcase=2 /importcase=all /variables= VAR1 0-2 A VAR2 3 F. |
Line 164: | Line 50: |
This command results in the following dataset: | Variables should be specified with start and end column indices and a [[SPSS/DataFormats#Input_and_Output_Formats|format]]. If a variable occupies a single column, just specify the index once. |
Line 166: | Line 52: |
||'''!CaseNum'''||'''Score'''||'''Time'''|| ||1||10||1200|| ||2||40||800|| ||3||15||1600|| |
The [[SPSS/DataList#Fixed|DATA LIST FIXED]] command can also be used for this task. === Multi-Record Data === If data for a single case is found on multiple records (i.e. multiple lines), try: {{{ get data /type=txt /file="path/to/data.txt" /arrangement=fixed /fixcase=2 /firstcase=2 /importcase=all /1 VAR1 0-2 A VAR2 3 F /2 VAR3 0-2 A VAR4 3 F. }}} Note also that the '''`/FIXCASE`''' subcommand is updated to `2`. |
Line 175: | Line 82: |
== Get Data == | == Delimited Data Files == |
Line 177: | Line 84: |
The `GET DATA` command is used to read in well-structured data. Examples for [[SPSS/FileIO#CSV|CSV]], [[SPSS/FileIO#Tab-delimited|tab-delimited]], and [[SPSS/FileIO#Fixed-width|fixed-width]] are available. | The `GET` command can be used to read any type of delimited data. |
Line 179: | Line 86: |
Note that columns are numbered starting at 0 for `GET DATA`, whereas `DATA LIST` starts at 1. | {{{ get data /type=txt /file="path/to/data.txt" /arrangement=delimited /delimiters="," /firstcase=2 /importcase=all /variables= VAR1 A1 VAR2 F2. }}} A [[SPSS/DataFormats#Print_Format|print and write format]] must be specified for each variable. Note however that numeric variables are read irrespective of the width and decimal places. It can still be meaningful to provide a date or time format to ensure accurate value parsing, and string formats must be accurately declared. The [[SPSS/DataList#LIST|DATA LIST LIST]] command can also be used for this task. |
SPSS Reading Data
SPSS has a wide variety of commands for reading in external or embedded data.
Contents
Embedded Data
The DATA LIST command can be used to read data that is listed between BEGIN DATA and END DATA commands.
data list list / CaseNum (F2) Score (F3). begin data 1, 10 2, 40 3, 15 4, 10 5, 15 6 7, 25 8, 10 end data.
Fixed Width Data Files
The GET command can be used to read fixed-width data contained in an external file.
get data /type=txt /file="path/to/data.txt" /arrangement=fixed /fixcase=1 /firstcase=2 /importcase=all /variables= VAR1 0-2 A VAR2 3 F.
Variables should be specified with start and end column indices and a format. If a variable occupies a single column, just specify the index once.
The DATA LIST FIXED command can also be used for this task.
Multi-Record Data
If data for a single case is found on multiple records (i.e. multiple lines), try:
get data /type=txt /file="path/to/data.txt" /arrangement=fixed /fixcase=2 /firstcase=2 /importcase=all /1 VAR1 0-2 A VAR2 3 F /2 VAR3 0-2 A VAR4 3 F.
Note also that the /FIXCASE subcommand is updated to 2.
Delimited Data Files
The GET command can be used to read any type of delimited data.
get data /type=txt /file="path/to/data.txt" /arrangement=delimited /delimiters="," /firstcase=2 /importcase=all /variables= VAR1 A1 VAR2 F2.
A print and write format must be specified for each variable. Note however that numeric variables are read irrespective of the width and decimal places. It can still be meaningful to provide a date or time format to ensure accurate value parsing, and string formats must be accurately declared.
The DATA LIST LIST command can also be used for this task.