Differences between revisions 4 and 5

SPSS String Functions

SPSS offers a minimal library of string functions.

Contents

SPSS String Functions

General Syntax

SPSS was written with the assumption that 1 byte = 1 character. Unicode complicated things.

String functions that relied upon this assumption now have two versions in SPSS: NAME for the original (broken) implementation and CHAR.NAME for the updated implementation.

PSPP simply observes locales.

Concat

Index

Length

Lower

The LOWER function returns the string expression folded to lowercase characters.

Note that non-ASCII characters are casefolded losslessly.

data list /orig 1 (a).
begin data.
ß
á
Á
end data.
string new1 new2 (A1).
compute new1=upcase(orig).
compute new2=lower(new1).
list.

orig new1 new2 
 
ß    ß    ß 
á    Á    á 
Á    Á    á

Lpad

Ltrim

Mblen

The CHAR.MBLEN functions returns the number of bytes at a character position.

The MBLEN.BYTE function returns the number of bytes in the character at a byte position.

In both cases, the first argument is a string expression and the second argument is an integer position.

Note: not supported in PSPP.

Normalize

The NORMALIZE function returns the normalized form of a Unicode string expression.

Unicode normalization is described here. In short: characters that are composed of combined characters but that also have a precomposed character equivalent are replaced with that equivalent. For example, é can be encoded as either U+00e9 or the combination of U+0065 and U+0301. Normalization replaces the latter with the former.

Note: if SPSS is not operating in Unicode mode, this function does nothing.

Note: not supported in PSPP.

Ntrim

Number

The NUMBER function returns the numeric representation of a string expression. The second argument is the format used to interpret the string.

The format's width determines how many characters are interpreted. number("1234", F3) returns 123.

If the value is invalid according to the format, the function returns a system missing value.

Replace

Rindex

Rpad

Rtrim

Strunc

Substr

Upcase

The UPCASE function returns the string expression folded to lowercase characters.