Stata String Functions
Stata supports these string functions in the global scope.
Contents
-
Stata String Functions
- Abbrev
- Char
- CollatorLocale
- CollatorVersion
- IndexNote
- Lower
- LTrim
- Plural
- Real
- RegexM
- RegexR
- RegexS
- RTrim
- Soundex
- Soundex_Nara
- String
- StrITrim
- StrLen
- StrLower
- StrLTrim
- StrOfReal
- StrPos
- StrProper
- StrReverse
- StrRPos
- StrRTrim
- StrToName
- StrTrim
- StrUpper
- SubInStr
- SubInWord
- SubStr
- ToBytes
- Trim
- UChar
- UIsDigit
- UIsLetter
- Upper
- UStrCompare
- UStrCompareEx
- UStrFix
- UStrFrom
- UStrInvalidCnt
- UStrLeft
- UStrLen
- UStrLower
- UStrLTrim
- UStrNormalize
- UStrPos
- UStrRegexM
- UStrRegexRf
- UStrRegexRa
- UStrRegexS
- UStrReverse
- UStrRight
- UStrRPos
- UStrRTrim
- UStrSortKey
- UStrSortKeyEx
- UStrTitle
- UStrTo
- UStrToHex
- UStrToName
- UStrTrim
- UStrUnescape
- UStrUpper
- UStrWord
- UStrWordCount
- USubInStr
- USubStr
- Word
- WordBreakLocale
- WordCount
Abbrev
Char
CollatorLocale
CollatorVersion
IndexNote
Lower
Deprecated name for strlower.
LTrim
Deprecated name for strltrim.
Plural
Real
RegexM
Match a string against a pattern. Returns 1 if the string matches and 0 otherwise.
The string must not contain a null byte (char(0)). While fixed-length strings cannot contain a null byte by design, long strings (strL) can. To get around this restriction, consider ustrregexm.
The
generate byte begins_with_number = regexm(string, "^[0-9]")
See here for details on Stata's regular expressions.
RegexR
Match a string against a pattern and replace the first matching substring with a replacement substring.
The string must not contain a null byte (char(0)). While fixed-length strings cannot contain a null byte by design, long strings (strL) can. Returned substrings can be up to 1,100,000 bytes long. To get around these restrictions, consider ustrregexrf.
To replace more than just the first matching substring, consider ustrregexra.
generate filename_without_extension = regexr(filename,"\.(txt|csv|tsv)","")
See here for details on Stata's regular expressions.
RegexS
Extract the nth matching substring from a prior regexm test. The 0th match is the original string if it matched.
Only the first 9 matching substrings are stored and available. Returned substrings can be up to 1,100,000 bytes long. To get around these restrictions, consider ustrregexs.
generate byte is_pipe_delimited = regexm(string,"[^|]+") generate first_field = regexs(1)
See here for details on Stata's regular expressions.
RTrim
Deprecated name for strrtrim.
Soundex
Soundex_Nara
String
Alias for strofreal.
StrITrim
StrLen
StrLower
StrLTrim
StrOfReal
StrPos
StrProper
StrReverse
StrRPos
StrRTrim
StrToName
StrTrim
StrUpper
SubInStr
SubInWord
SubStr
Extract a substring from a string using a start argument and an optional length argument, as substr(string, start, length). If the optional length argument is left off or set to the missing value (.), the extraction continues to the end of the string.
generate skip_first_character = substr(string, 2) generate skip_first_character = substr(string, 2, .) generate second_character = substr(string, 2, 1) generate last_character = substr(string, -1, 1)
The start and length parameters are byte positions rather than character indices, which does not matter for ASCII data but will impact many other character encodings. If the optional length argument is left off and a null byte (char(0)) is encountered between the start byte position and the end of the string, the extraction ends at that null byte (excluding the null byte). To get around these restrictions, consider usubstr.
ToBytes
Trim
Deprecated name for strtrim.
UChar
UIsDigit
UIsLetter
Upper
Deprecated name for strupper.
UStrCompare
UStrCompareEx
UStrFix
UStrFrom
UStrInvalidCnt
UStrLeft
Extract the first n characters from a string.
generate first_two = ustrleft(string, 2)
UStrLen
The returned value is in terms of characters, irrespective of wide characters. To return a value that can be used in fixed-width fonts respecting wide characters, a variant named udstrlen is also available.
UStrLower
UStrLTrim
UStrNormalize
UStrPos
UStrRegexM
Match a Unicode string against a pattern. Returns 1 if the string matches and 0 otherwise.
The optional third argument toggles case-insensitive matching. The default is 0 (case-sensitive).
generate byte begins_with_number = ustrregexm(string, "^[0-9]") generate byte begins_with_letter = ustrregexm(string, "^[a-z]", 1)
See here for details on Stata's regular expressions.
UStrRegexRf
Match a Unicode string against a pattern and replace the first matching substring with a replacement substring.
The optional fourth argument toggles case-insensitive matching. The default is 0 (case-sensitive).
generate filename_without_extension = ustrregexrf(filename, "\.(txt|csv|tsv)", "", 1)
See here for details on Stata's regular expressions.
UStrRegexRa
Match a Unicode string against a pattern and replace all matching substrings with a replacement substring.
The optional fourth argument toggles case-insensitive matching. The default is 0 (case-sensitive).
generate name_without_numbers = ustrregexra(name, "[0-9]", "") generate name_without_accented_a = ustrregexra(name, "[áàȧâäǎăāãå]", "a", 1)
See here for details on Stata's regular expressions.
UStrRegexS
Extract the nth matching substring from a prior regexm test. The 0th match is the original string if it matched.
generate byte is_pipe_delimited = ustrregexm(string,"[^|]+") generate first_field = ustrregexs(1)
See here for details on Stata's regular expressions.
UStrReverse
UStrRight
Extract the last n characters from a string.
generate last_two = ustrright(string, 2)
UStrRPos
UStrRTrim
UStrSortKey
UStrSortKeyEx
UStrTitle
UStrTo
UStrToHex
UStrToName
UStrTrim
UStrUnescape
UStrUpper
UStrWord
UStrWordCount
USubInStr
USubStr
Extract a substring from a string using start and length arguments, as usubstr(string, start, length). If the length argument is the missing value (.), the extraction continues to the end of the string.
generate skip_first_character = usubstr(string, 2, .) generate second_character = usubstr(string, 2, 1) generate last_character = usubstr(string, -1, 1)
The start and length parameters are character indices, irrespective of wide characters. To extract a substring that can be printed in fixed-width fonts to a fixed-length space respecting wide characters, a variant named udsubstr is also available.
Word
WordBreakLocale