= Stata String Functions =

Stata supports these '''string functions''' in the global scope.

<<TableOfContents>>

----



== General Purpose ==

||'''Function Name'''||'''Meaning'''                                                         ||'''Example'''||
||`abbrev(s,n)`      || || ||
||`plural(n,s)`      ||Append "s" to string s if n>1, otherwise returns the original string s|| ||
||`plural(n,s,p)`    ||As `plural` but specifying the plural form p explicitly               || ||
||`real(s)`          ||Convert string s to a real value                                      || ||
||`string(n)`        ||Convert numeric value n to a string                                   || ||
||`string(n,f)`      ||Convert numeric value n to a string using format f                    || ||
||`stritrim(s)`      ||Remove duplicated internal space characters                           || ||
||`strofreal(n)`     ||Convert numeric value n to a string                                   || ||
||`strofreal(n,f)`   ||Convert numeric value n to a string using format f                    || ||

There is a large set of functions designed for string data representing ''strictly'' ASCII-encoded values.

||'''Function Name''' ||'''Meaning'''                                                         ||'''Example'''||
||`char(n)`           ||ASCII code n                                                          || ||
||`indexnot(a,b)`     || || ||
||`lower(s)`          ||Convert to lowercase                                                  || ||
||`ltrim(s)`          ||Remove leading space characters                                       || ||
||`rtrim(s)`          ||Remove trailing space characters                                      || ||
||`soundex(s)`        || || ||
||`soundex_nara(s)`   || || ||
||`strlen(s)`         ||Length of string s in characters/bytes                                || ||
||`strlower(s)`       ||Convert to lowercase                                                  || ||
||`strltrim(s)`       ||Remove leading space characters                                       || ||
||`strpos(s,p)`       || || ||
||`strproper(s)`      ||Convert to proper case                                                || ||
||`strreverse(s)`     || || ||
||`strrpos(s,p)`      || || ||
||`strrtrim(s)`       ||Remove trailing space characters                                      || ||
||`strtrim(s)`        ||Remove external space characters                                      || ||
||`strupper(s)`       ||Convert to uppercase                                                  || ||
||`subinstr(s,p,r,n)` ||Replace the first n matches of pattern p with replacement r           || ||
||`subinword(s,p,r,n)`|| || ||
||`substr(s,o)`       ||Return the substring of string s from offset o                        || ||
||`substr(s,o,n)`     ||Return the substring of string s from offset o for length n characters|| ||
||`trim(s)`           ||Remove external space characters                                      || ||
||`upper(s)`          ||Convert to uppercase                                                  || ||
||`word(s,n)`         || || ||
||`wordcount(s)`      || || ||

These are the new functions designed for Unicode-encoded values. In many cases, they are named similarly except for a 'ustr-' prefix.

||'''Function Name''' ||'''Meaning'''                                                         ||'''Example'''||
||`uchar(n)`          ||Unicode code n                                                        || ||
||`udstrlen(s)`       ||Length of string s in display columns, respecting wide characters     || ||
||`udsubstr(s,o,n)`   ||Return the substring of string s from offset o for n display columns  || ||
||`uisdigit(s)`       || || ||
||`uisletter(s)`      || || ||
||`ustrcompare(a,b)`  || || ||
||`ustrcompare(a,b,l)`|| || ||
||`ustrleft(s,n)`     ||Return the leftmost substring of string s for length n characters     || ||
||`ustrlen(s)`        ||Length of string s in characters                                      || ||
||`ustrlower(s)`      ||Convert to lowercase                                                  || ||
||`ustrlower(s,l)`    ||Convert to lowercase in locale l                                      || ||
||`ustrltrim(s)`      || || ||
||`ustrpos(s)`        || || ||
||`ustrreverse(s)`    || || ||
||`ustrright(s,n)`    ||Return the rightmost substring of string s for length n characters    || ||
||`ustrrpos(s,p)`     || || ||
||`ustrrpos(s,p,o)`   || || ||
||`ustrrtrim(s)`      || || ||
||`ustrsortkey(s)`    || || ||
||`ustrsortkey(s,l)`  || || ||
||`ustrtitle(s)`      ||Convert to title case                                                 || ||
||`ustrtitle(s,l)`    ||Convert to title case in locale l                                     || ||
||`ustrtrim(s)`       ||Remove external whitespace characters                                 || ||
||`ustrupper(s)`      ||Convert to uppercase                                                  || ||
||`ustrupper(s,l)`    ||Convert to uppercase in locale l                                      || ||
||`ustrword(s,n)`     || || ||
||`ustrword(s,n,l)`   || || ||
||`ustrwordcount(s)`  || || ||
||`ustrwordcount(s,l)`|| || ||
||`usubinstr(s,p,r,n)`||Replace the first n matches of pattern p with replacement r           || ||
||`usubstr(s,o,n)`    ||Return the substring of string s from offset o for length n characters|| ||

A couple of notes about the `substr` functions:

 * Negative offsets are interpreted as offsets from the end of the string value.
 * Missing lengths are interpreted as the maximum; read until the end of the string value.

{{{
generate skip_first_character = usubstr(string, 2, .)
generate second_character = usubstr(string, 2, 1)
generate last_character = usubstr(string, -1, 1)
}}}



----



== Regular Expression Functions ==

There are two sets of regular expression functions. The first are the legacy functions designed for string data representing strictly ASCII-encoded values.

||'''Function Name'''||'''Meaning'''                                               ||'''Example'''                               ||
||`regexm(s,p)`      ||1 if string s matches pattern p, 0 otherwise                ||`regexm(zip5,"^[0-9][0-9][0-9][0-9][0-9]$")`||
||`regexr(s,p,r)`    ||Replace all matches to pattern p with replacement r         ||`regexr(filename,"\.(txt|csv|tsv)","")`     ||
||`regexs(n)`        ||The nth (in [1,9]) pattern match from the last `regexm` call|| ||

The second set are the new functions designed for Unicode-encoded values.

||'''Function Name'''   ||'''Meaning'''                                          ||'''Example'''                               ||
||`ustrregexm(s,p)`     ||1 if string s matches pattern p, 0 otherwise           || ||
||`ustrregexm(s,p,b)`   ||Call `ustrregexm` with case-insensitivity if b is 1    || ||
||`ustrregexrf(s,p,r)`  ||Replace the first match to pattern p with replacement r|| ||
||`ustrregexrf(s,p,r,b)`||Call `ustrregexrf` with case-insensitivity if b is 1   || ||
||`ustrregexra(s,p,r)`  ||Replace all matches to pattern p with replacement r    || ||
||`ustrregexra(s,p,r,b)`||Call `ustrregexrf` with case-insensitivity if b is 1   || ||
||`ustrregexs(n)`       ||The nth pattern match from the last `ustrregexm` call  || ||

For `ustrregexs`, note that the 0th match is them entire original string if it matched the pattern at all.

See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions syntax.

----



== Encoding and Decoding Functions ==

There are several function meant for encoding or decoding string data.

||'''Function Name''' ||'''Meaning'''||
||`tobytes(s)`        || ||
||`tobytes(s,n)`      || ||
||`ustrfix(s)`        || ||
||`ustrfix(s,r)`      || ||
||`ustrfrom(s,e,m)`   || ||
||`ustrinvalidcnt(s)` || ||
||`ustrnormalize(s,m)`|| ||
||`ustrto(s,e,m)`     || ||
||`ustrtohex(s)`      || ||
||`ustrtohex(s,n)`    || ||
||`ustrunescape(s)`   || ||

----



== Locale Name Functions ==

Several of the above string functions take an optional ''locale name'' argument. This creates the need for more functions that can parse and validate locale names.

||'''Function Name'''   ||'''Meaning'''||
||`collatorlocale(l,t)` || ||
||`collatorversion(l)`  || ||
||`wordbreaklocale(s,n)`|| ||

----



== Stata Name Functions ==

Stata offers several functions for generating a safe name, as for use in generating variables or macros.

||'''Function Name''' ||'''Meaning'''             ||
||`strtoname(s)`      ||Create a Stata 13 name    ||
||`ustrtoname(s)`     ||Create a modern Stata name||

Both of these functions are variadic. If the second argument is a 1, and then if the first character is numeric, the returned name is prefixed with an underscore character.

----



== See also ==

[[https://www.stata.com/manuals/fnstringfunctions.pdf|Stata string functions]]



----
CategoryRicottone