= Stata String Functions = Stata supports these '''string functions''' in the global scope. <> ---- == Abbrev == ---- == Char == ---- == CollatorLocale == ---- == CollatorVersion == ---- == IndexNote == ---- == Lower == Deprecated name for `strlower`. ---- == LTrim == Deprecated name for `strltrim`. ---- == Plural == ---- == Real == ---- == RegexM == Match a string against a pattern. Returns 1 if the string matches and 0 otherwise. The string must not contain a null byte (`char(0)`). While fixed-length strings cannot contain a null byte by design, long strings (`strL`) can. To get around this restriction, consider [[Stata/StringFunctions#UstrRegexM|ustrregexm]]. The {{{ generate byte begins_with_number = regexm(string, "^[0-9]") }}} See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions. ---- == RegexR == Match a string against a pattern and replace the first matching substring with a replacement substring. The string must not contain a null byte (`char(0)`). While fixed-length strings cannot contain a null byte by design, long strings (`strL`) can. Returned substrings can be up to 1,100,000 bytes long. To get around these restrictions, consider [[Stata/StringFunctions#UstrRegexRf|ustrregexrf]]. To replace more than just the first matching substring, consider [[Stata/StringFunctions#UstrRegexRa|ustrregexra]]. {{{ generate filename_without_extension = regexr(filename,"\.(txt|csv|tsv)","") }}} See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions. ---- == RegexS == Extract the nth matching substring from a prior `regexm` test. The 0th match is the original string if it matched. Only the first 9 matching substrings are stored and available. Returned substrings can be up to 1,100,000 bytes long. To get around these restrictions, consider [[Stata/StringFunctions#UstrRegexS|ustrregexs]]. {{{ generate byte is_pipe_delimited = regexm(string,"[^|]+") generate first_field = regexs(1) }}} See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions. ---- == RTrim == Deprecated name for `strrtrim`. ---- == Soundex == ---- == Soundex_Nara == ---- == String == Alias for [[Stata/StringFunctions#StrOfReal|strofreal]]. ---- == StrITrim == ---- == StrLen == ---- == StrLower == ---- == StrLTrim == ---- == StrOfReal == ---- == StrPos == ---- == StrProper == ---- == StrReverse == ---- == StrRPos == ---- == StrRTrim == ---- == StrToName == ---- == StrTrim == ---- == StrUpper == ---- == SubInStr == ---- == SubInWord == ---- == SubStr == Extract a substring from a string using a ''start'' argument and an optional ''length'' argument, as `substr(string, start, length)`. If the optional ''length'' argument is left off or set to the missing value (`.`), the extraction continues to the end of the string. {{{ generate skip_first_character = substr(string, 2) generate skip_first_character = substr(string, 2, .) generate second_character = substr(string, 2, 1) generate last_character = substr(string, -1, 1) }}} The ''start'' and ''length'' parameters are byte positions rather than character indices, which does not matter for ASCII data but will impact many other character encodings. If the optional ''length'' argument is left off and a null byte (`char(0)`) is encountered between the ''start'' byte position and the end of the string, the extraction ends at that null byte (excluding the null byte). To get around these restrictions, consider [[Stata/StringFunctions#USubStr|usubstr]]. ---- == ToBytes == ---- == Trim == Deprecated name for `strtrim`. ---- == UChar == ---- == UIsDigit == ---- == UIsLetter == ---- == Upper == Deprecated name for `strupper`. ---- == UStrCompare == ---- == UStrCompareEx == ---- == UStrFix == ---- == UStrFrom == ---- == UStrInvalidCnt == ---- == UStrLeft == Extract the first n characters from a string. {{{ generate first_two = ustrleft(string, 2) }}} ---- == UStrLen == The returned value is in terms of characters, irrespective of wide characters. To return a value that can be used in fixed-width fonts respecting wide characters, a variant named `udstrlen` is also available. ---- == UStrLower == ---- == UStrLTrim == ---- == UStrNormalize == ---- == UStrPos == ---- == UStrRegexM == Match a Unicode string against a pattern. Returns 1 if the string matches and 0 otherwise. The optional third argument toggles case-insensitive matching. The default is 0 (case-sensitive). {{{ generate byte begins_with_number = ustrregexm(string, "^[0-9]") generate byte begins_with_letter = ustrregexm(string, "^[a-z]", 1) }}} See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions. ---- == UStrRegexRf == Match a Unicode string against a pattern and replace the first matching substring with a replacement substring. The optional fourth argument toggles case-insensitive matching. The default is 0 (case-sensitive). {{{ generate filename_without_extension = ustrregexrf(filename, "\.(txt|csv|tsv)", "", 1) }}} See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions. ---- == UStrRegexRa == Match a Unicode string against a pattern and replace all matching substrings with a replacement substring. The optional fourth argument toggles case-insensitive matching. The default is 0 (case-sensitive). {{{ generate name_without_numbers = ustrregexra(name, "[0-9]", "") generate name_without_accented_a = ustrregexra(name, "[áàȧâäǎăāãå]", "a", 1) }}} See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions. ---- == UStrRegexS == Extract the nth matching substring from a prior `regexm` test. The 0th match is the original string if it matched. {{{ generate byte is_pipe_delimited = ustrregexm(string,"[^|]+") generate first_field = ustrregexs(1) }}} See [[Stata/RegularExpressions|here]] for details on Stata's regular expressions. ---- == UStrReverse == ---- == UStrRight == Extract the last n characters from a string. {{{ generate last_two = ustrright(string, 2) }}} ---- == UStrRPos == ---- == UStrRTrim == ---- == UStrSortKey == ---- == UStrSortKeyEx == ---- == UStrTitle == ---- == UStrTo == ---- == UStrToHex == ---- == UStrToName == ---- == UStrTrim == ---- == UStrUnescape == ---- == UStrUpper == ---- == UStrWord == ---- == UStrWordCount == ---- == USubInStr == ---- == USubStr == Extract a substring from a string using ''start'' and ''length'' arguments, as `usubstr(string, start, length)`. If the ''length'' argument is the missing value (`.`), the extraction continues to the end of the string. {{{ generate skip_first_character = usubstr(string, 2, .) generate second_character = usubstr(string, 2, 1) generate last_character = usubstr(string, -1, 1) }}} The ''start'' and ''length'' parameters are character indices, irrespective of wide characters. To extract a substring that can be printed in fixed-width fonts to a fixed-length space respecting wide characters, a variant named `udsubstr` is also available. ---- == Word == ---- == WordBreakLocale == ---- == WordCount == ---- CategoryRicottone