Featured Post

Set up machine learning and deep learning on AWS

Here is the simple instructions to set up a EC2 instance to run machine learning and deep learning on AWS 1.  Run an EC2 instance from ...

Dec 21, 2021

Regular Expression with Python

 Metacharacters

[] : A set of characters

\ : Signals a special sequence (can also be used to escape special characters) "\d"

. : Single dot to match any character (except newline character) "he..o"

^ : Starts with "^hello"

$ : Ends with "planet$"

* : Zero or more occurrences "he.*o"

+ : One or more occurrences "he.+o"

? : Zero or one occurrences "he.?o"

{} : Exactly the specified number of occurrences "he{2}o"

| : Either or "x|y" matches either "x" or "y"

() : Capture and group


Special Sequences

\A Returns a match if the specified characters are at the beginning of the string "\AThe"

\b Returns a match where the specified characters are at the beginning or at the end of a word

(the "r" in the beginning is making sure that the string is being treated as a "raw string") r"\bain"

r"ain\b"

\B Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word

(the "r" in the beginning is making sure that the string is being treated as a "raw string") r"\Bain"

r"ain\B"

\d Returns a match where the string contains digits (numbers from 0-9) "\d"

\D Returns a match where the string DOES NOT contain digits "\D"

\s Returns a match where the string contains a white space character "\s"

\S Returns a match where the string DOES NOT contain a white space character "\S"

\w Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) "\w"

\W Returns a match where the string DOES NOT contain any word characters "\W"

\Z Returns a match if the specified characters are at the end of the string "Spain\Z"

Sets or Character Class

[arn] Returns a match where one of the specified characters (a, r, or n) are present

[a-n] Returns a match for any lower case character, alphabetically between a and n

[^arn] Returns a match for any character EXCEPT a, r, and n

[0123] Returns a match where any of the specified digits (0, 1, 2, or 3) are present

[0-9] Returns a match for any digit between 0 and 9

[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59

[a-zA-Z] Returns a match for any character alphabetically between a and z, lower case OR upper case

[+] In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

Python regular expression 

The re module in Python provides functions and support for regular expressions.

re.search()

re.match()

re.sub()

Replace "xxx" or "yyy" with "abc"

>>> old_string = "I xxx Python yyy"

>>> new_string = re.sub(r"xxx|yyy", "abc", old_string)

Replace first "xxx" with "abc"

>>> old_string = "abc xxx abc xxx"

>>> new_string = re.sub("xxx", "abc", old_string, 1)

>>> print(new_string)