= SPSS Unicode =

SPSS was written with the assumption that 1 character = 1 byte. That isn't true for many character encodings, including Unicode. Needless to say, [[SPSS/StringFunctions#General_Syntax|we have a workaround now]].

Setting the default encoding to Unicode ''further'' complicated things, in that Windows users were now forced to think about locale for the first time ever.

PSPP sidestepped this entire issue by observing locale.

<<TableOfContents>>

----



== Default Encoding ==

SPSS versions <21 default to the encoding prescribed by the system locale, bearing in mind that SPSS versions <16 do not support Unicode at all.

SPSS versions >=21 use Unicode by default.

SPSS servers inherit from a connected client.



=== Encoding Override ===

The `SET UNICODE` function toggles Unicode mode. `YES` or `ON` enable the mode, while `NO` or `OFF` disable it.

If Unicode mode is disabled, SPSS tries to use the encoding prescribed by the system locale.

Unicode mode cannot be altered while a data file is open. Try:

{{{
dataset close all.
new file.
set unicode=on.
show unicode.
}}}

Note: not supported or needed in PSPP.



=== Locale Override ===

To check the current locale, use the `SHOW LOCALE` function.

The `SET LOCALE` function overrides the system locale within the SPSS session. Try:

{{{
set locale="Japanese".
}}}

The allowed options for locale are called ''LocaleIDs'', which are meant to follow the [[https://www.iana.org/assignments/character-sets/character-sets.xhtml|IANA character sets registry]].

Note: the allowed LocaleIDs changed ''without compatibility'' in SPSS version 16.

Note: for SPSS servers, the allowed LocaleIDs come from a configuration file (`loclmap.xml`). Notably Windows-1252 is not included by default. The systems administrator needs to alter this file to make additional LocaleIDs available.

----



== Text Data ==

=== Reading Files ===

SPSS versions >=21 support an `/ENCODING` subcommand on `GET DATA`. Prior to this point, all text data had to be encoded according to the system locale. Even so, the only valid options were `"LOCALE"` and `"UTF8"`.

SPSS version 23 added support for `"UTF16"`, `"UTF16BE"`, and `"UTF16LE"`.



=== Writing Files ===

SPSS versions >=19 support an `/ENCODING` subcommand on `SAVE TRANSLATE` with a `/TYPE` of `SAS` or `STATA`. Valid options are `"LOCALE"`, `"UTF8"`, `"UTF16"`, `"UTF16BE"`, `"UTF16LE"`, a numeric Windows code page value (such as `"1252"`), or an IANA code page value (such as `"iso8859-1"`). 

If Unicode mode is enabled, the default is `"UTF8"`. (Otherwise it defaults to `"LOCALE"`.)

----



== Binary Data ==

PSPP does not support proprietary binary data formats.



=== Reading Files ===

SPSS versions >=19 support an `/ENCODING` subcommand on `GET SAS` and `GET STATA`. Valid options include `"LOCALE"`, `"UTF8"`, `"Windows-1252"`, and several other Windows and IBM codepages.



=== Writing Files ===

SPSS versions >=19 support an `/ENCODING` subcommand on `SAVE TRANSLATE` with a `/TYPE` of `SAS` or `STATA`. Valid options include `"LOCALE"`, `"UTF8"`, `"Windows-1252"`, and several other Windows and IBM codepages. For SAS exports, the encoding applies to both the data file (`/OUTFILE`) and the value labels file (`/VALFILE`).

The default for Stata and SAS versions <9 is always `"LOCALE"`. If Unicode mode is enabled, the default for SAS version 9>= is `"UTF8"`. (Otherwise it defaults to `"LOCALE"`.)

Note: SPSS version 25 introduced interoperability with Stata 14, which is the first version of Stata to support Unicode.



----
CategoryRicottone