Featured Post

Set up machine learning and deep learning on AWS

Here is the simple instructions to set up a EC2 instance to run machine learning and deep learning on AWS 1.  Run an EC2 instance from ...

Showing posts with label Natual Language Processing. Show all posts
Showing posts with label Natual Language Processing. Show all posts

Dec 21, 2021

Regular Expression with Python

 Metacharacters

[] : A set of characters

\ : Signals a special sequence (can also be used to escape special characters) "\d"

. : Single dot to match any character (except newline character) "he..o"

^ : Starts with "^hello"

$ : Ends with "planet$"

* : Zero or more occurrences "he.*o"

+ : One or more occurrences "he.+o"

? : Zero or one occurrences "he.?o"

{} : Exactly the specified number of occurrences "he{2}o"

| : Either or "x|y" matches either "x" or "y"

() : Capture and group


Special Sequences

\A Returns a match if the specified characters are at the beginning of the string "\AThe"

\b Returns a match where the specified characters are at the beginning or at the end of a word

(the "r" in the beginning is making sure that the string is being treated as a "raw string") r"\bain"

r"ain\b"

\B Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word

(the "r" in the beginning is making sure that the string is being treated as a "raw string") r"\Bain"

r"ain\B"

\d Returns a match where the string contains digits (numbers from 0-9) "\d"

\D Returns a match where the string DOES NOT contain digits "\D"

\s Returns a match where the string contains a white space character "\s"

\S Returns a match where the string DOES NOT contain a white space character "\S"

\w Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) "\w"

\W Returns a match where the string DOES NOT contain any word characters "\W"

\Z Returns a match if the specified characters are at the end of the string "Spain\Z"

Sets or Character Class

[arn] Returns a match where one of the specified characters (a, r, or n) are present

[a-n] Returns a match for any lower case character, alphabetically between a and n

[^arn] Returns a match for any character EXCEPT a, r, and n

[0123] Returns a match where any of the specified digits (0, 1, 2, or 3) are present

[0-9] Returns a match for any digit between 0 and 9

[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59

[a-zA-Z] Returns a match for any character alphabetically between a and z, lower case OR upper case

[+] In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

Python regular expression 

The re module in Python provides functions and support for regular expressions.

re.search()

re.match()

re.sub()

Replace "xxx" or "yyy" with "abc"

>>> old_string = "I xxx Python yyy"

>>> new_string = re.sub(r"xxx|yyy", "abc", old_string)

Replace first "xxx" with "abc"

>>> old_string = "abc xxx abc xxx"

>>> new_string = re.sub("xxx", "abc", old_string, 1)

>>> print(new_string)

Jan 4, 2021

All About Python - what can Python do?

 


One of the most commonly asked questions about Python is : What Python can do? As a fast growing language, Python has been used in many domains, including data analytics, data visualization, model development, natural language processing, and many others.

1. Python for data analytics

This is a the domain usually dominated by R, SAS, SQL, Matlab, etc. With the rich Python libraries, one can achieve almost everything that these language/software can do. And there is only one single programming language to learn. Sounds amazing? You may use SAS for data processing and you are familiar with working on a table. You need to do summaries, aggregations, table joining. The Python alternative is 'pandas'. Pandas provides a table interface called - data frame and tons of functions, capable to do anything you can imagine. What if you need to work on arrays, matrix, high dimensional data, 'numpy' is the library to provide an array interface to process list of lists ...

2. Python for data visualization
Data visualization or graphics is not usually easily adopted in other languages. This is an area used to be dominated by softwares, such as Matlab, origin, etc. With Python, there are so many amazing libraries to realize the graphics, such as the matplotlib, bokeh, seaborn, etc.

3. Python for model development

Model development, or machine learning is the most attractive applications of Python. For research/study, sklearn is good enough to cover major machine learning models, and facilities to build models. The best feature is the model development API that is now popular to the model developers. For more advanced areas, like machine learning, the most prevalently used libraries are Tensorflow and PyTorch, which are originated from Google and Facebook separately.

4. Python for natural language processing

Natural language processing (NLP) is one of the trending subjects of artificial intelligence. Most programming language has zero touch on this subjects. The study and research, the natural language toolkit (NLTK) is a good library to get started. Some other libraries, such as spaCy provides more advanced capabilities to processing natural language.