Size: 2395
Comment:
|
Size: 3184
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 42: | Line 42: |
The '''`LOWER`''' function returns the string expression folded to lowercase characters. Note that non-ASCII characters are casefolded losslessly. {{{ data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list. }}} {{{ orig new1 new2 ß ß ß á Á á Á Á á }}} |
|
Line 145: | Line 170: |
The '''`UPCASE`''' function returns the string expression folded to lowercase characters. Note that non-ASCII characters are casefolded losslessly. {{{ data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list. }}} {{{ orig new1 new2 ß ß ß á Á á Á Á á }}} |
SPSS String Functions
SPSS offers a minimal library of string functions.
Contents
General Syntax
SPSS was written with the assumption that 1 byte = 1 character. Unicode complicated things.
String functions that relied upon this assumption now have two versions in SPSS: NAME for the original (broken) implementation and CHAR.NAME for the updated implementation.
PSPP simply observes locales.
Concat
Index
Length
Lower
The LOWER function returns the string expression folded to lowercase characters.
Note that non-ASCII characters are casefolded losslessly.
data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list.
orig new1 new2 ß ß ß á Á á Á Á á
Lpad
Ltrim
Mblen
The CHAR.MBLEN functions returns the number of bytes at a character position.
The MBLEN.BYTE function returns the number of bytes in the character at a byte position.
In both cases, the first argument is a string expression and the second argument is an integer position.
Note: not supported in PSPP.
Normalize
The NORMALIZE function returns the normalized form of a Unicode string expression.
Unicode normalization is described here. In short: characters that are composed of combined characters but that also have a precomposed character equivalent are replaced with that equivalent. For example, é can be encoded as either U+00e9 or the combination of U+0065 and U+0301. Normalization replaces the latter with the former.
Note: if SPSS is not operating in Unicode mode, this function does nothing.
Note: not supported in PSPP.
Ntrim
Number
The NUMBER function returns the numeric representation of a string expression. The second argument is the format used to interpret the string.
The format's width determines how many characters are interpreted. number("1234", F3) returns 123.
If the value is invalid according to the format, the function returns a system missing value.
See also the STRING function.
Replace
Rindex
Rpad
Rtrim
Strunc
Substr
Upcase
The UPCASE function returns the string expression folded to lowercase characters.
Note that non-ASCII characters are casefolded losslessly.
data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list.
orig new1 new2 ß ß ß á Á á Á Á á