= SPSS String Functions = SPSS supports these '''string functions''' in the global scope. <> ---- == General Syntax == SPSS was written with the assumption that 1 byte = 1 character. [[SPSS/Unicode|Unicode complicated things.]] String functions that relied upon this assumption now have two versions in SPSS: `FUNCNAME` for the original, deprecated implementation and `char.FUNCNAME` for the updated implementation. PSPP simply observes locales. ---- == Concat == ---- == Index == The '''`CHAR.INDEX`''' function returns the character position of a pattern inside a string expression, or 0 if the pattern is not found. To extract the user and domain from an email address, try: {{{ string user domain (A60). do if char.index(email, "@")>0. compute user = char.substr(email, 1, char.index(email,'@')). compute domain = char.substr(email, char.index(email,'@') + 1). end if. }}} ---- == Length == ---- == Lower == The '''`LOWER`''' function returns the string expression folded to lowercase characters. Note that non-ASCII characters are casefolded losslessly. {{{ data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list. }}} {{{ orig new1 new2 ß ß ß á Á á Á Á á }}} ---- == LPad == ---- == LTrim == The '''`LTRIM`''' function returns the string expression with leading whitespace trimmed. ---- == MbLen == The '''`CHAR.MBLEN`''' functions returns the number of bytes at a character position. The '''`MBLEN.BYTE`''' function returns the number of bytes in the character at a byte position. In both cases, the first argument is a string expression and the second argument is an integer position. Note: not supported in PSPP. ---- == Normalize == The '''`NORMALIZE`''' function returns the ''normalized form'' of a Unicode string expression. '''Unicode normalization''' is described [[https://en.wikipedia.org/wiki/Unicode_equivalence#Combining_and_precomposed_characters|here]]. In short: characters that are composed of combined characters but that also have a precomposed character equivalent are replaced with that equivalent. For example, `é` can be encoded as either U+00e9 or the combination of U+0065 and U+0301. Normalization replaces the latter with the former. Note: if SPSS is not operating in Unicode mode, this function does nothing. Note: not supported in PSPP. ---- == NTrim == ---- == Number == The '''`NUMBER`''' function returns the numeric representation of a string expression. The second argument is the [[SPSS/ReadingData#SPSS_Formats|format]] used to interpret the string. The format's width determines how many characters are interpreted. `number("1234", F3)` returns `123`. If the value is invalid according to the format, the function returns a system missing value. See also the [[SPSS/NumericFunctions#String|STRING function]]. ---- == Replace == The '''`REPLACE`''' function returns the string expression with all occurrences of a pattern substituted with a replacement. To replace tabs and other tricky whitespace characters, try: {{{ loop #i=09 to 13. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. }}} To replace all non-ASCII alphanumeric characters, try: {{{ loop #i=01 to 47. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. loop #i=58 to 64. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. loop #i=91 to 96. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. loop #i=123 to 127. compute my_string=replace(my_string, string(#i,pib1), ''). end loop. }}} ---- == RIndex == ---- == RPad == ---- == RTrim == The '''`RTRIM`''' function returns the string expression with trailing whitespace trimmed. ---- == Strunc == ---- == Substr == The '''`CHAR.SUBSTR`''' function returns the substring of a string expression starting at a character position. If the optional third argument is specified, the substring stops at that character length. To extract the user and domain from an email address, try: {{{ string user domain (A60). do if char.index(email, "@")>0. compute user = char.substr(email, 1, char.index(email,'@')). compute domain = char.substr(email, char.index(email,'@') + 1). end if. }}} ---- == Upcase == The '''`UPCASE`''' function returns the string expression folded to lowercase characters. Note that non-ASCII characters are casefolded losslessly. {{{ data list /orig 1 (a). begin data. ß á Á end data. string new1 new2 (A1). compute new1=upcase(orig). compute new2=lower(new1). list. }}} {{{ orig new1 new2 ß ß ß á Á á Á Á á }}} ---- CategoryRicottone