= Python Re = '''`re`''' is a module for regular expressions. <> ---- == Usage == === Match === Scan a string to see if it matches a pattern from its beginning. If it matches, return a '''match object'''. Otherwise `None` is returned. A match object is a collection of '''match groups'''. The first (0th) match group is the entire match. {{{ from re import match m = match(r"\$[1-9][0-9]*", listed_price) if m is not None: print("USD", m.group(0)) # or: print(m.expand("USD \0")) }}} A subpatterns are included as subsequent match groups. {{{ from re import match m = match(r"\$([1-9][0-9]*)", listed_price) if m is not None: print("USD", m.group(1)) # or: print(m.expand("USD \1")) }}} A tuple of the subsequent match groups is also available from the `groups()` method. {{{ from re import match m = re.match(r"([A-Z]{3}) (\$[1-9][0-9]*)", currency_plus_listed_price) if m is not None: current_listed_price_pair = m.groups() }}} === Search === Similar to `match()`, but is not constrained to the beginning of a string. === FullMatch === Similar to `match()`, but is requires that the ''entire'' string match the pattern. === FindAll === Scans a string for a pattern and returns all substrings that match. {{{ from re import findall m = findall(r"\$[1-9][0-9]*", "$1 $2 $3") # ['$1', '$2', '$3'] }}} === Split === Splits a string by all matches to a pattern. If the pattern includes a subpattern, the subpattern matches are included. {{{ from re import split s = split('-', 'a-b-c') # ['a', 'b', 'c'] s = split('-', '-a-b-c-') # ['', 'a', 'b', 'c', ''] s = split('(-)', 'a-b-c') # ['a', '-', 'b', '-', 'c'] }}} === Sub === Return a new string built by substituting all matches to a pattern with a replacement. {{{ import re s = re.sub("[\t ]+",";","a whitespace delimited string") # "a;whitespace;delimited;string" s = re.sub("[\t ]+",";","a whitespace delimited string", count=1) # "a;whitespace delimited string" }}} The replacement can include backreferences. `\6` is replaced with the substring in match group 6. For this reason, backslashes are handled uniquely in this function. 'Known' escape sequences (like `\n`) are processed and converted to the represented character (a newline). 'Unknown' escape sequences using an ASCII character raise an error. All others, such as `\&`, are left as-is. The replacement can be also be a callback function. It is passed the entire match object as an argument, and is expected to return a string. {{{ import re def redact_external_emails(m): if m.group(0).lower().endswith("example.com"): return m.group(0) else: return '' re.sub(r"[A-Za-z]+@[A-Za-z]+\.[A-Za-z]+", redact_external_emails, "me@example.com you@elsewhere.org They@Example.com") }}} === SubN === Similar to `sub()`, but returns a tuple of the new string and a count of substitutions performed. === Compile === All other functions in the `re` module take a pattern string as the first argument. The regular expression engine internally compiles (and caches) that pattern. If a pattern will be reused frequently, it can be more efficient to compile the pattern once and reuse it directly. The `compile()` function returns such a compiled pattern. It has methods mirroring all of the other functions. {{{ from re import compile p = compile(r"\$([1-9][0-9]*)") g = p.match("$1000000").groups() # ('1000000',) g = p.search("$1000000").groups() # ('1000000',) s = p.sub(r"\1 dollars","$1000000") # '1000000 dollars' }}} ---- == Type Annotations == `Match` can be used to annotate a match object. `Pattern` can be used to annotate a compiled regular expression. Both take `typing.AnyStr` by default, but can be further constrained by annotating with `Match[str]`, `Match[bytes]`, `Pattern[str]`, or `Pattern[bytes]`. ---- == See also == [[https://docs.python.org/3/library/re.html|Python re module documentation]] [[https://pymotw.com/3/re/|Python Module of the Day article for re]] ---- CategoryRicottone