Differences between revisions 2 and 3
Revision 2 as of 2022-08-24 13:53:46
Size: 2153
Comment:
Revision 3 as of 2022-08-24 13:57:56
Size: 2387
Comment:
Deletions are marked like this. Additions are marked like this.
Line 32: Line 32:

----



== Reading Binary Data ==

SPSS versions >=19 support an `/ENCODING` subcommand on `GET SAS` and `GET STATA`. Valid options include `"LOCALE"`, `"UTF8"`, `"Windows-1252"`, and several other Windows and IBM codepages.

SPSS Unicode

SPSS was written with the assumption that 1 byte = 1 character. That isn't true for many character encodings, including Unicode. Needless to say, we have a workaround now.

Setting the default encoding to Unicode further complicated things, in that Windows users were now forced to think about locale for the first time ever.

PSPP sidestepped this entire issue by observing locale.


Default Encoding

SPSS versions <21 default to the encoding prescribed by the system locale, bearing in mind that SPSS versions <16 do not support Unicode at all.

SPSS versions >=21 use Unicode by default.

SPSS servers inherit from a connected client.


Reading Text Data

SPSS versions >=21 support an /ENCODING subcommand on GET DATA. Prior to this point, all text data had to be encoded according to the system locale. Even so, the only valid options were "LOCALE" and "UTF8".

SPSS version 23 added support for "UTF16", "UTF16BE", and "UTF16LE".


Reading Binary Data

SPSS versions >=19 support an /ENCODING subcommand on GET SAS and GET STATA. Valid options include "LOCALE", "UTF8", "Windows-1252", and several other Windows and IBM codepages.


Encoding Override

The SET UNICODE function toggles Unicode mode. YES or ON enable the mode, while NO or OFF disable it.

If Unicode mode is disabled, SPSS tries to use the encoding prescribed by the system locale.

Note: not supported or needed in PSPP.


Locale Override

To check the current locale, use the SHOW LOCALE function.

The SET LOCALE function overrides the system locale within the SPSS session. Try:

set locale="Japanese".

The allowed options for locale are called LocaleIDs, which are meant to follow the IANA character sets registry.

Note: the allowed LocaleIDs changed without compatibility in SPSS version 16.

Note: for SPSS servers, the allowed LocaleIDs come from a configuration file (loclmap.xml). Notably Windows-1252 is not included by default. The systems administrator needs to alter this file to make additional LocaleIDs available.


CategoryRicottone

SPSS/Unicode (last edited 2023-05-30 19:35:40 by DominicRicottone)