Stata String Functions
Stata supports these string functions in the global scope.
Contents
General Purpose
Function Name |
Meaning |
Example |
abbrev(s,n) |
|
|
plural(n,s) |
Append "s" to string s if n>1, otherwise returns the original string s |
|
plural(n,s,p) |
As plural but specifying the plural form p explicitly |
|
real(s) |
Convert string s to a real value |
|
string(n) |
Convert numeric value n to a string |
|
string(n,f) |
Convert numeric value n to a string using format f |
|
stritrim(s) |
Remove duplicated internal space characters |
|
strofreal(n) |
Convert numeric value n to a string |
|
strofreal(n,f) |
Convert numeric value n to a string using format f |
|
There is a large set of functions designed for string data representing strictly ASCII-encoded values.
Function Name |
Meaning |
Example |
char(n) |
ASCII code n |
|
indexnot(a,b) |
|
|
lower(s) |
Convert to lowercase |
|
ltrim(s) |
Remove leading space characters |
|
rtrim(s) |
Remove trailing space characters |
|
soundex(s) |
|
|
soundex_nara(s) |
|
|
strlen(s) |
Length of string s in characters/bytes |
|
strlower(s) |
Convert to lowercase |
|
strltrim(s) |
Remove leading space characters |
|
strpos(s,p) |
|
|
strproper(s) |
Convert to proper case |
|
strreverse(s) |
|
|
strrpos(s,p) |
|
|
strrtrim(s) |
Remove trailing space characters |
|
strtrim(s) |
Remove external space characters |
|
strupper(s) |
Convert to uppercase |
|
subinstr(s,p,r,n) |
Replace the first n matches of pattern p with replacement r |
|
subinword(s,p,r,n) |
|
|
substr(s,o) |
Return the substring of string s from offset o |
|
substr(s,o,n) |
Return the substring of string s from offset o for length n characters |
|
trim(s) |
Remove external space characters |
|
upper(s) |
Convert to uppercase |
|
word(s,n) |
|
|
wordcount(s) |
|
|
These are the new functions designed for Unicode-encoded values. In many cases, they are named similarly except for a 'ustr-' prefix.
Function Name |
Meaning |
Example |
uchar(n) |
Unicode code n |
|
udstrlen(s) |
Length of string s in display columns, respecting wide characters |
|
udsubstr(s,o,n) |
Return the substring of string s from offset o for n display columns |
|
uisdigit(s) |
|
|
uisletter(s) |
|
|
ustrcompare(a,b) |
|
|
ustrcompare(a,b,l) |
|
|
ustrleft(s,n) |
Return the leftmost substring of string s for length n characters |
|
ustrlen(s) |
Length of string s in characters |
|
ustrlower(s) |
Convert to lowercase |
|
ustrlower(s,l) |
Convert to lowercase in locale l |
|
ustrltrim(s) |
|
|
ustrpos(s) |
|
|
ustrreverse(s) |
|
|
ustrright(s,n) |
Return the rightmost substring of string s for length n characters |
|
ustrrpos(s,p) |
|
|
ustrrpos(s,p,o) |
|
|
ustrrtrim(s) |
|
|
ustrsortkey(s) |
|
|
ustrsortkey(s,l) |
|
|
ustrtitle(s) |
Convert to title case |
|
ustrtitle(s,l) |
Convert to title case in locale l |
|
ustrtrim(s) |
Remove external whitespace characters |
|
ustrupper(s) |
Convert to uppercase |
|
ustrupper(s,l) |
Convert to uppercase in locale l |
|
ustrword(s,n) |
|
|
ustrword(s,n,l) |
|
|
ustrwordcount(s) |
|
|
ustrwordcount(s,l) |
|
|
usubinstr(s,p,r,n) |
Replace the first n matches of pattern p with replacement r |
|
usubstr(s,o,n) |
Return the substring of string s from offset o for length n characters |
|
A couple of notes about the substr functions:
- Negative offsets are interpreted as offsets from the end of the string value.
- Missing lengths are interpreted as the maximum; read until the end of the string value.
generate skip_first_character = usubstr(string, 2, .) generate second_character = usubstr(string, 2, 1) generate last_character = usubstr(string, -1, 1)
Regular Expression Functions
There are two sets of regular expression functions. The first are the legacy functions designed for string data representing strictly ASCII-encoded values.
Function Name |
Meaning |
Example |
regexm(s,p) |
1 if string s matches pattern p, 0 otherwise |
regexm(zip5,"^[0-9][0-9][0-9][0-9][0-9]$") |
regexr(s,p,r) |
Replace all matches to pattern p with replacement r |
regexr(filename,"\.(txt|csv|tsv)","") |
regexs(n) |
The nth (in [1,9]) pattern match from the last regexm call |
|
The second set are the new functions designed for Unicode-encoded values.
Function Name |
Meaning |
Example |
ustrregexm(s,p) |
1 if string s matches pattern p, 0 otherwise |
|
ustrregexm(s,p,b) |
Call ustrregexm with case-insensitivity if b is 1 |
|
ustrregexrf(s,p,r) |
Replace the first match to pattern p with replacement r |
|
ustrregexrf(s,p,r,b) |
Call ustrregexrf with case-insensitivity if b is 1 |
|
ustrregexra(s,p,r) |
Replace all matches to pattern p with replacement r |
|
ustrregexra(s,p,r,b) |
Call ustrregexrf with case-insensitivity if b is 1 |
|
ustrregexs(n) |
The nth pattern match from the last ustrregexm call |
|
For ustrregexs, note that the 0th match is them entire original string if it matched the pattern at all.
See here for details on Stata's regular expressions syntax.
Encoding and Decoding Functions
There are several function meant for encoding or decoding string data.
Function Name |
Meaning |
tobytes(s) |
|
tobytes(s,n) |
|
ustrfix(s) |
|
ustrfix(s,r) |
|
ustrfrom(s,e,m) |
|
ustrinvalidcnt(s) |
|
ustrnormalize(s,m) |
|
ustrto(s,e,m) |
|
ustrtohex(s) |
|
ustrtohex(s,n) |
|
ustrunescape(s) |
|
Locale Name Functions
Several of the above string functions take an optional locale name argument. This creates the need for more functions that can parse and validate locale names.
Function Name |
Meaning |
collatorlocale(l,t) |
|
collatorversion(l) |
|
wordbreaklocale(s,n) |
|
Stata Name Functions
Stata offers several functions for generating a safe name, as for use in generating variables or macros.
Function Name |
Meaning |
strtoname(s) |
Create a Stata 13 name |
ustrtoname(s) |
Create a modern Stata name |
Both of these functions are variadic. If the second argument is a 1, and then if the first character is numeric, the returned name is prefixed with an underscore character.