Stata Regular Expressions


Operators

Stata tries to support the POSIX.2 standard.

Operator

Effect

*

match zero or more of the preceding expression

+

match one or more of the preceding expression

?

match either zero or one of the preceding expression

a-z

when between two characters (not operators), a dash means match a range of characters or numbers

.

match any character

\

escape a character to match the literal character that would otherwise be interpreted as an operator

^

when at the beginning of a pattern, a caret means match the beginning of string

$

when at the end of a regular expression, a dollar sign means match the end of string

|

match either the preceding expression or the following expression

[ and ]

denote a set of characters that can be matched

( and )

denote a subexpression


Functions

There are two sets of regular expression functions in Stata.

regexm tests a string for a pattern. regexr replaces the first matching substring in a string. regexs extracts a matching subtring (up to the 9th) from a string. These functions all assume that the string is strict ASCII; does not contain null bytes (char(0)); and are restricted in terms of how many matching substrings can be accessed or manipulated.

ustrregexm tests a string for a pattern. ustrregexrf replaces the first matching substring in a string. ustrregexra replaces all matching substrings in a string. ustrregexs extracts a matching subtring from a string. These functions bypass all of the above restrictions.


CategoryRicottone

Stata/RegularExpressions (last edited 2022-12-27 16:27:44 by DominicRicottone)