= SPSS String Functions =

SPSS supports these '''string functions''' in the global scope. 

<<TableOfContents>>

----



== General Syntax ==

SPSS was written with the assumption that 1 byte = 1 character. [[SPSS/Unicode|Unicode complicated things.]]

String functions that relied upon this assumption now have two versions in SPSS: `FUNCNAME` for the original, deprecated implementation and `char.FUNCNAME` for the updated implementation.

PSPP simply observes locales.

----



== Concat ==

----



== Index ==

The '''`CHAR.INDEX`''' function returns the character position of a pattern inside a string expression, or 0 if the pattern is not found.

To extract the user and domain from an email address, try:

{{{
string user domain (A60).
do if char.index(email, "@")>0.
  compute user =   char.substr(email, 1, char.index(email,'@')).
  compute domain = char.substr(email, char.index(email,'@') + 1).
end if.
}}}

----



== Length ==

----



== Lower ==

The '''`LOWER`''' function returns the string expression folded to lowercase characters. 

Note that non-ASCII characters are casefolded losslessly.

{{{
data list /orig 1 (a).
begin data.
ß
á
Á
end data.
string new1 new2 (A1).
compute new1=upcase(orig).
compute new2=lower(new1).
list.
}}}

{{{
orig new1 new2 
 
ß    ß    ß 
á    Á    á 
Á    Á    á
}}}

----



== LPad ==

----



== LTrim ==

The '''`LTRIM`''' function returns the string expression with leading whitespace trimmed.

----



== MbLen ==

The '''`CHAR.MBLEN`''' functions returns the number of bytes at a character position.

The '''`MBLEN.BYTE`''' function returns the number of bytes in the character at a byte position.

In both cases, the first argument is a string expression and the second argument is an integer position.

Note: not supported in PSPP.

----



== Normalize ==

The '''`NORMALIZE`''' function returns the ''normalized form'' of a Unicode string expression.

'''Unicode normalization''' is described [[https://en.wikipedia.org/wiki/Unicode_equivalence#Combining_and_precomposed_characters|here]]. In short: characters that are composed of combined characters but that also have a precomposed character equivalent are replaced with that equivalent. For example, `é` can be encoded as either U+00e9 or the combination of U+0065 and U+0301. Normalization replaces the latter with the former.

Note: if SPSS is not operating in Unicode mode, this function does nothing.

Note: not supported in PSPP.

----



== NTrim ==

----



== Number ==

The '''`NUMBER`''' function returns the numeric representation of a string expression. The second argument is the [[SPSS/ReadingData#SPSS_Formats|format]] used to interpret the string.

The format's width determines how many characters are interpreted. `number("1234", F3)` returns `123`.

If the value is invalid according to the format, the function returns a system missing value.

See also the [[SPSS/NumericFunctions#String|STRING function]].

----



== Replace ==

The '''`REPLACE`''' function returns the string expression with all occurrences of a pattern substituted with a replacement.

To replace tabs and other tricky whitespace characters, try:

{{{
loop #i=09 to 13.
compute my_string=replace(my_string, string(#i,pib1), '').
end loop.
}}}

To replace all non-ASCII alphanumeric characters, try:

{{{
loop #i=01 to 47.
compute my_string=replace(my_string, string(#i,pib1), '').
end loop.
loop #i=58 to 64.
compute my_string=replace(my_string, string(#i,pib1), '').
end loop.
loop #i=91 to 96.
compute my_string=replace(my_string, string(#i,pib1), '').
end loop.
loop #i=123 to 127.
compute my_string=replace(my_string, string(#i,pib1), '').
end loop.
}}}

----



== RIndex ==

----



== RPad ==

----



== RTrim ==

The '''`RTRIM`''' function returns the string expression with trailing whitespace trimmed.

----



== Strunc ==

----



== Substr ==

The '''`CHAR.SUBSTR`''' function returns the substring of a string expression starting at a character position. If the optional third argument is specified, the substring stops at that character length.

To extract the user and domain from an email address, try:

{{{
string user domain (A60).
do if char.index(email, "@")>0.
  compute user =   char.substr(email, 1, char.index(email,'@')).
  compute domain = char.substr(email, char.index(email,'@') + 1).
end if.
}}}

----



== Upcase ==

The '''`UPCASE`''' function returns the string expression folded to lowercase characters. 

Note that non-ASCII characters are casefolded losslessly.

{{{
data list /orig 1 (a).
begin data.
ß
á
Á
end data.
string new1 new2 (A1).
compute new1=upcase(orig).
compute new2=lower(new1).
list.
}}}

{{{
orig new1 new2 
 
ß    ß    ß 
á    Á    á 
Á    Á    á
}}}



----
CategoryRicottone