Stata String Functions

Stata supports these string functions in the global scope.


General Purpose

Function Name

Meaning

Example

abbrev(s,n)

plural(n,s)

Append "s" to string s if n>1, otherwise returns the original string s

plural(n,s,p)

As plural but specifying the plural form p explicitly

real(s)

Convert string s to a real value

string(n)

Convert numeric value n to a string

string(n,f)

Convert numeric value n to a string using format f

stritrim(s)

Remove duplicated internal space characters

strofreal(n)

Convert numeric value n to a string

strofreal(n,f)

Convert numeric value n to a string using format f

There is a large set of functions designed for string data representing strictly ASCII-encoded values.

Function Name

Meaning

Example

char(n)

ASCII code n

indexnot(a,b)

lower(s)

Convert to lowercase

ltrim(s)

Remove leading space characters

rtrim(s)

Remove trailing space characters

soundex(s)

soundex_nara(s)

strlen(s)

Length of string s in characters/bytes

strlower(s)

Convert to lowercase

strltrim(s)

Remove leading space characters

strpos(s,p)

strproper(s)

Convert to proper case

strreverse(s)

strrpos(s,p)

strrtrim(s)

Remove trailing space characters

strtrim(s)

Remove external space characters

strupper(s)

Convert to uppercase

subinstr(s,p,r,n)

Replace the first n matches of pattern p with replacement r

subinword(s,p,r,n)

substr(s,o)

Return the substring of string s from offset o

substr(s,o,n)

Return the substring of string s from offset o for length n characters

trim(s)

Remove external space characters

upper(s)

Convert to uppercase

word(s,n)

wordcount(s)

These are the new functions designed for Unicode-encoded values. In many cases, they are named similarly except for a 'ustr-' prefix.

Function Name

Meaning

Example

uchar(n)

Unicode code n

udstrlen(s)

Length of string s in display columns, respecting wide characters

udsubstr(s,o,n)

Return the substring of string s from offset o for n display columns

uisdigit(s)

uisletter(s)

ustrcompare(a,b)

ustrcompare(a,b,l)

ustrleft(s,n)

Return the leftmost substring of string s for length n characters

ustrlen(s)

Length of string s in characters

ustrlower(s)

Convert to lowercase

ustrlower(s,l)

Convert to lowercase in locale l

ustrltrim(s)

ustrpos(s)

ustrreverse(s)

ustrright(s,n)

Return the rightmost substring of string s for length n characters

ustrrpos(s,p)

ustrrpos(s,p,o)

ustrrtrim(s)

ustrsortkey(s)

ustrsortkey(s,l)

ustrtitle(s)

Convert to title case

ustrtitle(s,l)

Convert to title case in locale l

ustrtrim(s)

Remove external whitespace characters

ustrupper(s)

Convert to uppercase

ustrupper(s,l)

Convert to uppercase in locale l

ustrword(s,n)

ustrword(s,n,l)

ustrwordcount(s)

ustrwordcount(s,l)

usubinstr(s,p,r,n)

Replace the first n matches of pattern p with replacement r

usubstr(s,o,n)

Return the substring of string s from offset o for length n characters

A couple of notes about the substr functions:

generate skip_first_character = usubstr(string, 2, .)
generate second_character = usubstr(string, 2, 1)
generate last_character = usubstr(string, -1, 1)


Regular Expression Functions

There are two sets of regular expression functions. The first are the legacy functions designed for string data representing strictly ASCII-encoded values.

Function Name

Meaning

Example

regexm(s,p)

1 if string s matches pattern p, 0 otherwise

regexm(zip5,"^[0-9][0-9][0-9][0-9][0-9]$")

regexr(s,p,r)

Replace all matches to pattern p with replacement r

regexr(filename,"\.(txt|csv|tsv)","")

regexs(n)

The nth (in [1,9]) pattern match from the last regexm call

The second set are the new functions designed for Unicode-encoded values.

Function Name

Meaning

Example

ustrregexm(s,p)

1 if string s matches pattern p, 0 otherwise

ustrregexm(s,p,b)

Call ustrregexm with case-insensitivity if b is 1

ustrregexrf(s,p,r)

Replace the first match to pattern p with replacement r

ustrregexrf(s,p,r,b)

Call ustrregexrf with case-insensitivity if b is 1

ustrregexra(s,p,r)

Replace all matches to pattern p with replacement r

ustrregexra(s,p,r,b)

Call ustrregexrf with case-insensitivity if b is 1

ustrregexs(n)

The nth pattern match from the last ustrregexm call

For ustrregexs, note that the 0th match is them entire original string if it matched the pattern at all.

See here for details on Stata's regular expressions syntax.


Encoding and Decoding Functions

There are several function meant for encoding or decoding string data.

Function Name

Meaning

tobytes(s)

tobytes(s,n)

ustrfix(s)

ustrfix(s,r)

ustrfrom(s,e,m)

ustrinvalidcnt(s)

ustrnormalize(s,m)

ustrto(s,e,m)

ustrtohex(s)

ustrtohex(s,n)

ustrunescape(s)


Locale Name Functions

Several of the above string functions take an optional locale name argument. This creates the need for more functions that can parse and validate locale names.

Function Name

Meaning

collatorlocale(l,t)

collatorversion(l)

wordbreaklocale(s,n)


Stata Name Functions

Stata offers several functions for generating a safe name, as for use in generating variables or macros.

Function Name

Meaning

strtoname(s)

Create a Stata 13 name

ustrtoname(s)

Create a modern Stata name

Both of these functions are variadic. If the second argument is a 1, and then if the first character is numeric, the returned name is prefixed with an underscore character.


See also

Stata string functions


CategoryRicottone

Stata/StringFunctions (last edited 2025-03-05 03:57:45 by DominicRicottone)