Stata Regular Expressions
Contents
Operators
Stata tries to support the POSIX.2 standard.
Operator |
Effect |
* |
match zero or more of the preceding expression |
+ |
match one or more of the preceding expression |
? |
match either zero or one of the preceding expression |
a-z |
when between two characters (not operators), a dash means match a range of characters or numbers |
. |
match any character |
\ |
escape a character to match the literal character that would otherwise be interpreted as an operator |
^ |
when at the beginning of a pattern, a caret means match the beginning of string |
$ |
when at the end of a regular expression, a dollar sign means match the end of string |
| |
match either the preceding expression or the following expression |
[ and ] |
denote a set of characters that can be matched |
( and ) |
denote a subexpression |
Functions
There are two sets of regular expression functions in Stata.
regexm tests a string for a pattern. regexr replaces the first matching substring in a string. regexs extracts a matching subtring (up to the 9th) from a string. These functions all assume that the string is strict ASCII; does not contain null bytes (char(0)); and are restricted in terms of how many matching substrings can be accessed or manipulated.
ustrregexm tests a string for a pattern. ustrregexrf replaces the first matching substring in a string. ustrregexra replaces all matching substrings in a string. ustrregexs extracts a matching subtring from a string. These functions bypass all of the above restrictions.