Stata Regular Expressions

Contents

Stata Regular Expressions
1. Operators
2. Functions

Operators

Stata tries to support the POSIX.2 standard.

Operator	Effect
`*`	match zero or more of the preceding expression
`+`	match one or more of the preceding expression
`?`	match either zero or one of the preceding expression
`a-z`	when between two characters (not operators), a dash means match a range of characters or numbers
`.`	match any character
`\`	escape a character to match the literal character that would otherwise be interpreted as an operator
`^`	when at the beginning of a pattern, a caret means match the beginning of string
`$`	when at the end of a regular expression, a dollar sign means match the end of string
`\|`	match either the preceding expression or the following expression
`[` and `]`	denote a set of characters that can be matched
`(` and `)`	denote a subexpression

Functions

There are two sets of regular expression functions in Stata.

regexm tests a string for a pattern. regexr replaces the first matching substring in a string. regexs extracts a matching subtring (up to the 9th) from a string. These functions all assume that the string is strict ASCII; does not contain null bytes (char(0)); and are restricted in terms of how many matching substrings can be accessed or manipulated.

ustrregexm tests a string for a pattern. ustrregexrf replaces the first matching substring in a string. ustrregexra replaces all matching substrings in a string. ustrregexs extracts a matching subtring from a string. These functions bypass all of the above restrictions.

CategoryRicottone

Stata/RegularExpressions

Stata Regular Expressions

Operators

Functions