Stata String Functions

Stata supports these string functions in the global scope.


Abbrev


Char


CollatorLocale


CollatorVersion


IndexNote


Lower

Deprecated name for strlower.


LTrim

Deprecated name for strltrim.


Plural


Real


RegexM

Match a string against a pattern. Returns 1 if the string matches and 0 otherwise.

The string must not contain a null byte (char(0)). While fixed-length strings cannot contain a null byte by design, long strings (strL) can. To get around this restriction, consider ustrregexm.

The

generate byte begins_with_number = regexm(string, "^[0-9]")

See here for details on Stata's regular expressions.


RegexR

Match a string against a pattern and replace the first matching substring with a replacement substring.

The string must not contain a null byte (char(0)). While fixed-length strings cannot contain a null byte by design, long strings (strL) can. Returned substrings can be up to 1,100,000 bytes long. To get around these restrictions, consider ustrregexrf.

To replace more than just the first matching substring, consider ustrregexra.

generate filename_without_extension = regexr(filename,"\.(txt|csv|tsv)","")

See here for details on Stata's regular expressions.


RegexS

Extract the nth matching substring from a prior regexm test. The 0th match is the original string if it matched.

Only the first 9 matching substrings are stored and available. Returned substrings can be up to 1,100,000 bytes long. To get around these restrictions, consider ustrregexs.

generate byte is_pipe_delimited = regexm(string,"[^|]+")
generate first_field = regexs(1)

See here for details on Stata's regular expressions.


RTrim

Deprecated name for strrtrim.


Soundex


Soundex_Nara


String

Alias for strofreal.


StrITrim


StrLen


StrLower


StrLTrim


StrOfReal


StrPos


StrProper


StrReverse


StrRPos


StrRTrim


StrToName


StrTrim


StrUpper


SubInStr


SubInWord


SubStr

Extract a substring from a string using a start argument and an optional length argument, as substr(string, start, length). If the optional length argument is left off or set to the missing value (.), the extraction continues to the end of the string.

generate skip_first_character = substr(string, 2)
generate skip_first_character = substr(string, 2, .)
generate second_character = substr(string, 2, 1)
generate last_character = substr(string, -1, 1)

The start and length parameters are byte positions rather than character indices, which does not matter for ASCII data but will impact many other character encodings. If the optional length argument is left off and a null byte (char(0)) is encountered between the start byte position and the end of the string, the extraction ends at that null byte (excluding the null byte). To get around these restrictions, consider usubstr.


ToBytes


Trim

Deprecated name for strtrim.


UChar


UIsDigit


UIsLetter


Upper

Deprecated name for strupper.


UStrCompare


UStrCompareEx


UStrFix


UStrFrom


UStrInvalidCnt


UStrLeft

Extract the first n characters from a string.

generate first_two = ustrleft(string, 2)


UStrLen

The returned value is in terms of characters, irrespective of wide characters. To return a value that can be used in fixed-width fonts respecting wide characters, a variant named udstrlen is also available.


UStrLower


UStrLTrim


UStrNormalize


UStrPos


UStrRegexM

Match a Unicode string against a pattern. Returns 1 if the string matches and 0 otherwise.

The optional third argument toggles case-insensitive matching. The default is 0 (case-sensitive).

generate byte begins_with_number = ustrregexm(string, "^[0-9]")
generate byte begins_with_letter = ustrregexm(string, "^[a-z]", 1)

See here for details on Stata's regular expressions.


UStrRegexRf

Match a Unicode string against a pattern and replace the first matching substring with a replacement substring.

The optional fourth argument toggles case-insensitive matching. The default is 0 (case-sensitive).

generate filename_without_extension = ustrregexrf(filename, "\.(txt|csv|tsv)", "", 1)

See here for details on Stata's regular expressions.


UStrRegexRa

Match a Unicode string against a pattern and replace all matching substrings with a replacement substring.

The optional fourth argument toggles case-insensitive matching. The default is 0 (case-sensitive).

generate name_without_numbers = ustrregexra(name, "[0-9]", "")
generate name_without_accented_a = ustrregexra(name, "[áàȧâäǎăāãå]", "a", 1)

See here for details on Stata's regular expressions.


UStrRegexS

Extract the nth matching substring from a prior regexm test. The 0th match is the original string if it matched.

generate byte is_pipe_delimited = ustrregexm(string,"[^|]+")
generate first_field = ustrregexs(1)

See here for details on Stata's regular expressions.


UStrReverse


UStrRight

Extract the last n characters from a string.

generate last_two = ustrright(string, 2)


UStrRPos


UStrRTrim


UStrSortKey


UStrSortKeyEx


UStrTitle


UStrTo


UStrToHex


UStrToName


UStrTrim


UStrUnescape


UStrUpper


UStrWord


UStrWordCount


USubInStr


USubStr

Extract a substring from a string using start and length arguments, as usubstr(string, start, length). If the length argument is the missing value (.), the extraction continues to the end of the string.

generate skip_first_character = usubstr(string, 2, .)
generate second_character = usubstr(string, 2, 1)
generate last_character = usubstr(string, -1, 1)

The start and length parameters are character indices, irrespective of wide characters. To extract a substring that can be printed in fixed-width fonts to a fixed-length space respecting wide characters, a variant named udsubstr is also available.


Word


WordBreakLocale


WordCount


CategoryRicottone

Stata/StringFunctions (last edited 2023-06-13 22:48:44 by DominicRicottone)