SPSS String Functions
SPSS supports these string functions in the global scope.
Contents
General Syntax
SPSS was written with the assumption that 1 byte = 1 character. Unicode complicated things.
String functions that relied upon this assumption now have two versions in SPSS: FUNCNAME for the original, deprecated implementation and char.FUNCNAME for the updated implementation.
PSPP simply observes locales.
Concat
Index
The CHAR.INDEX function returns the character position of a pattern inside a string expression, or 0 if the pattern is not found.
To extract the user and domain from an email address, try:
string user domain (A60). do if char.index(email, "@")>0. compute user = char.substr(email, 1, char.index(email,'@')). compute domain = char.substr(email, char.index(email,'@') + 1). end if.
Length
Lower
The LOWER function returns the string expression folded to lowercase characters.
Note that non-ASCII characters are casefolded losslessly.
data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list.
orig new1 new2 ß ß ß á Á á Á Á á
LPad
LTrim
The LTRIM function returns the string expression with leading whitespace trimmed.
MbLen
The CHAR.MBLEN functions returns the number of bytes at a character position.
The MBLEN.BYTE function returns the number of bytes in the character at a byte position.
In both cases, the first argument is a string expression and the second argument is an integer position.
Note: not supported in PSPP.
Normalize
The NORMALIZE function returns the normalized form of a Unicode string expression.
Unicode normalization is described here. In short: characters that are composed of combined characters but that also have a precomposed character equivalent are replaced with that equivalent. For example, é can be encoded as either U+00e9 or the combination of U+0065 and U+0301. Normalization replaces the latter with the former.
Note: if SPSS is not operating in Unicode mode, this function does nothing.
Note: not supported in PSPP.
NTrim
Number
The NUMBER function returns the numeric representation of a string expression. The second argument is the format used to interpret the string.
The format's width determines how many characters are interpreted. number("1234", F3) returns 123.
If the value is invalid according to the format, the function returns a system missing value.
See also the STRING function.
Replace
The REPLACE function returns the string expression with all occurrences of a pattern substituted with a replacement.
To replace tabs and other tricky whitespace characters, try:
loop #i=09 to 13. compute my_string=replace(my_string, string(#i,pib1), ''). end loop.
To replace all non-ASCII alphanumeric characters, try:
loop #i=01 to 47. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. loop #i=58 to 64. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. loop #i=91 to 96. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. loop #i=123 to 127. compute my_string=replace(my_string, string(#i,pib1), ''). end loop.
RIndex
RPad
RTrim
The RTRIM function returns the string expression with trailing whitespace trimmed.
Strunc
Substr
The CHAR.SUBSTR function returns the substring of a string expression starting at a character position. If the optional third argument is specified, the substring stops at that character length.
To extract the user and domain from an email address, try:
string user domain (A60). do if char.index(email, "@")>0. compute user = char.substr(email, 1, char.index(email,'@')). compute domain = char.substr(email, char.index(email,'@') + 1). end if.
Upcase
The UPCASE function returns the string expression folded to lowercase characters.
Note that non-ASCII characters are casefolded losslessly.
data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list.
orig new1 new2 ß ß ß á Á á Á Á á