= Stata String Functions = Stata supports these '''string functions''' in the global scope. <> ---- == General Purpose == ||'''Function Name'''||'''Meaning''' ||'''Example'''|| ||`abbrev(s,n)` || || || ||`plural(n,s)` ||Append "s" to string s if n>1, otherwise returns the original string s|| || ||`plural(n,s,p)` ||As `plural` but specifying the plural form p explicitly || || ||`real(s)` ||Convert string s to a real value || || ||`string(n)` ||Convert numeric value n to a string || || ||`string(n,f)` ||Convert numeric value n to a string using format f || || ||`stritrim(s)` ||Remove duplicated internal space characters || || ||`strofreal(n)` ||Convert numeric value n to a string || || ||`strofreal(n,f)` ||Convert numeric value n to a string using format f || || There is a large set of functions designed for string data representing ''strictly'' ASCII-encoded values. ||'''Function Name''' ||'''Meaning''' ||'''Example'''|| ||`char(n)` ||ASCII code n || || ||`indexnot(a,b)` || || || ||`lower(s)` ||Convert to lowercase || || ||`ltrim(s)` ||Remove leading space characters || || ||`rtrim(s)` ||Remove trailing space characters || || ||`soundex(s)` || || || ||`soundex_nara(s)` || || || ||`strlen(s)` ||Length of string s in characters/bytes || || ||`strlower(s)` ||Convert to lowercase || || ||`strltrim(s)` ||Remove leading space characters || || ||`strpos(s,p)` || || || ||`strproper(s)` ||Convert to proper case || || ||`strreverse(s)` || || || ||`strrpos(s,p)` || || || ||`strrtrim(s)` ||Remove trailing space characters || || ||`strtrim(s)` ||Remove external space characters || || ||`strupper(s)` ||Convert to uppercase || || ||`subinstr(s,p,r,n)` ||Replace the first n matches of pattern p with replacement r || || ||`subinword(s,p,r,n)`|| || || ||`substr(s,o)` ||Return the substring of string s from offset o || || ||`substr(s,o,n)` ||Return the substring of string s from offset o for length n characters|| || ||`trim(s)` ||Remove external space characters || || ||`upper(s)` ||Convert to uppercase || || ||`word(s,n)` || || || ||`wordcount(s)` || || || These are the new functions designed for Unicode-encoded values. In many cases, they are named similarly except for a 'ustr-' prefix. ||'''Function Name''' ||'''Meaning''' ||'''Example'''|| ||`uchar(n)` ||Unicode code n || || ||`udstrlen(s)` ||Length of string s in display columns, respecting wide characters || || ||`udsubstr(s,o,n)` ||Return the substring of string s from offset o for n display columns || || ||`uisdigit(s)` || || || ||`uisletter(s)` || || || ||`ustrcompare(a,b)` || || || ||`ustrcompare(a,b,l)`|| || || ||`ustrleft(s,n)` ||Return the leftmost substring of string s for length n characters || || ||`ustrlen(s)` ||Length of string s in characters || || ||`ustrlower(s)` ||Convert to lowercase || || ||`ustrlower(s,l)` ||Convert to lowercase in locale l || || ||`ustrltrim(s)` || || || ||`ustrpos(s)` || || || ||`ustrreverse(s)` || || || ||`ustrright(s,n)` ||Return the rightmost substring of string s for length n characters || || ||`ustrrpos(s,p)` || || || ||`ustrrpos(s,p,o)` || || || ||`ustrrtrim(s)` || || || ||`ustrsortkey(s)` || || || ||`ustrsortkey(s,l)` || || || ||`ustrtitle(s)` ||Convert to title case || || ||`ustrtitle(s,l)` ||Convert to title case in locale l || || ||`ustrtrim(s)` ||Remove external whitespace characters || || ||`ustrupper(s)` ||Convert to uppercase || || ||`ustrupper(s,l)` ||Convert to uppercase in locale l || || ||`ustrword(s,n)` || || || ||`ustrword(s,n,l)` || || || ||`ustrwordcount(s)` || || || ||`ustrwordcount(s,l)`|| || || ||`usubinstr(s,p,r,n)`||Replace the first n matches of pattern p with replacement r || || ||`usubstr(s,o,n)` ||Return the substring of string s from offset o for length n characters|| || A couple of notes about the `substr` functions: * Negative offsets are interpreted as offsets from the end of the string value. * Missing lengths are interpreted as the maximum; read until the end of the string value. {{{ generate skip_first_character = usubstr(string, 2, .) generate second_character = usubstr(string, 2, 1) generate last_character = usubstr(string, -1, 1) }}} ---- == Regular Expression Functions == There are two sets of regular expression functions. The first are the legacy functions designed for string data representing strictly ASCII-encoded values. ||'''Function Name'''||'''Meaning''' ||'''Example''' || ||`regexm(s,p)` ||1 if string s matches pattern p, 0 otherwise ||`regexm(zip5,"^[0-9][0-9][0-9][0-9][0-9]$")`|| ||`regexr(s,p,r)` ||Replace all matches to pattern p with replacement r ||`regexr(filename,"\.(txt|csv|tsv)","")` || ||`regexs(n)` ||The nth (in [1,9]) pattern match from the last `regexm` call|| || The second set are the new functions designed for Unicode-encoded values. ||'''Function Name''' ||'''Meaning''' ||'''Example''' || ||`ustrregexm(s,p)` ||1 if string s matches pattern p, 0 otherwise || || ||`ustrregexm(s,p,b)` ||Call `ustrregexm` with case-insensitivity if b is 1 || || ||`ustrregexrf(s,p,r)` ||Replace the first match to pattern p with replacement r|| || ||`ustrregexrf(s,p,r,b)`||Call `ustrregexrf` with case-insensitivity if b is 1 || || ||`ustrregexra(s,p,r)` ||Replace all matches to pattern p with replacement r || || ||`ustrregexra(s,p,r,b)`||Call `ustrregexrf` with case-insensitivity if b is 1 || || ||`ustrregexs(n)` ||The nth pattern match from the last `ustrregexm` call || || For `ustrregexs`, note that the 0th match is them entire original string if it matched the pattern at all. See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions syntax. ---- == Encoding and Decoding Functions == There are several function meant for encoding or decoding string data. ||'''Function Name''' ||'''Meaning'''|| ||`tobytes(s)` || || ||`tobytes(s,n)` || || ||`ustrfix(s)` || || ||`ustrfix(s,r)` || || ||`ustrfrom(s,e,m)` || || ||`ustrinvalidcnt(s)` || || ||`ustrnormalize(s,m)`|| || ||`ustrto(s,e,m)` || || ||`ustrtohex(s)` || || ||`ustrtohex(s,n)` || || ||`ustrunescape(s)` || || ---- == Locale Name Functions == Several of the above string functions take an optional ''locale name'' argument. This creates the need for more functions that can parse and validate locale names. ||'''Function Name''' ||'''Meaning'''|| ||`collatorlocale(l,t)` || || ||`collatorversion(l)` || || ||`wordbreaklocale(s,n)`|| || ---- == Stata Name Functions == Stata offers several functions for generating a safe name, as for use in generating variables or macros. ||'''Function Name''' ||'''Meaning''' || ||`strtoname(s)` ||Create a Stata 13 name || ||`ustrtoname(s)` ||Create a modern Stata name|| Both of these functions are variadic. If the second argument is a 1, and then if the first character is numeric, the returned name is prefixed with an underscore character. ---- == See also == [[https://www.stata.com/manuals/fnstringfunctions.pdf|Stata string functions]] ---- CategoryRicottone