Regular expressions (also known as regex or regexp) are a sequence of characters that define a search pattern. They are a powerful tool used to match patterns in text, allowing you to search for and extract specific information from a string of text.

In a regular expression, you can use metacharacters to define rules for matching patterns. These metacharacters include special characters and symbols that have a specific meaning. For example, the period “.” character in a regular expression matches any single character, and the asterisk “*” character matches zero or more occurrences of the preceding character.
Regular expressions (regex) are a powerful tool that can be used in a wide range of applications for matching patterns in text. Some popular applications of regex include:
- Data validation: Regular expressions can be used to validate user input, such as validating email addresses, phone numbers, or credit card numbers.
- Text parsing: Regular expressions can be used to extract specific information from a text string, such as finding all instances of a particular word or phrase, or extracting data from structured text data formats like CSV files.
- Web scraping: Regular expressions can be used to scrape data from websites by searching for specific patterns in the HTML or XML code.
- Search and replace: Regular expressions can be used to search for and replace text in a document or file, which can be especially useful when dealing with large amounts of text.
- Programming: Regular expressions are widely used in programming languages like Python, JavaScript, and Perl for tasks like string manipulation, text processing, and data analysis.
- Command-line tools: Many command-line tools, such as
grep
andsed
in Unix-based systems, support regular expressions for searching and manipulating text files.
Getting Started
To use regular expressions in Python, you first need to import the re
module. Here’s an example:
import re
Basic Syntax
The basic syntax for a regular expression is a pattern that you want to match in a string. For example, the regular expression cat
will match any string that contains the word “cat”.
import re
pattern = "cat"
text = "The cat is black and white."
match = re.search(pattern, text)
if match:
print("Match found!")
else:
print("Match not found.")
This will output “Match found!” because the word “cat” is present in the text
string.
Metacharacters
In addition to literal text, regular expressions can use metacharacters to match patterns. Some commonly used metacharacters are:
.
: matches any single character except a newline*
: matches zero or more occurrences of the preceding character+
: matches one or more occurrences of the preceding character?
: matches zero or one occurrence of the preceding character|
: matches either the expression before or after the pipe symbol[]
: matches any one character within the brackets()
: groups expressions together
Here are some examples:
import re
# match any string that starts with "cat"
pattern = "^cat"
# match any string that ends with "cat"
pattern = "cat$"
# match any string that contains "cat" followed by any single character
pattern = "cat."
# match any string that contains "ca" followed by zero or more "t" characters
pattern = "ca*t"
# match any string that contains "ca" followed by one or more "t" characters
pattern = "ca+t"
# match any string that contains "cat" or "dog"
pattern = "cat|dog"
# match any string that contains a lowercase vowel
pattern = "[aeiou]"
# match any string that contains the characters "cat" or "dog"
pattern = "(cat|dog)"
Using re.search()
and re.findall()
The re.search()
function returns a match object if the pattern is found in the string, or None
if the pattern is not found.
import re
pattern = "cat"
text = "The cat is black and white."
match = re.search(pattern, text)
if match:
print("Match found!")
else:
print("Match not found.")
The re.findall()
function returns a list of all non-overlapping matches of the pattern in the string.
import re
pattern = "cat"
text = "The cat is black and white. Another cat is sleeping."
matches = re.findall(pattern, text)
print(matches)
This will output ['cat', 'cat']
because there are two occurrences of the word “cat” in the text
string.
Using re.sub()
The re.sub()
function can be used to replace parts of a string that match a pattern with a new string.
import re
pattern = "cat"
text = "The cat is black and white. Another cat is sleeping."
new_text = re.sub(pattern, "dog", text)
print(new_text)
This will output “The dog is black and white. Another dog is sleeping.”
Other Python Regex examples
1. Find words between two strings in a sentence
Here’s an example code snippet that shows how to find all words between two strings in a sentence using look around assertions:
import re
sentence = "The quick brown fox jumps over the lazy dog."
start_word = "quick"
end_word = "dog"
pattern = r'(?<=' + start_word + r')\s+\w+\s+(?=' + end_word + r')'
matches = re.findall(pattern, sentence)
print(matches)
In this example, we define a regular expression pattern that matches any word that is preceded by the start_word
and followed by the end_word
, with one or more whitespace characters in between. The pattern uses positive look behind (?<=start_word)
to ensure that the start word is present before the match, and positive lookahead (?=end_word)
to ensure that the end word is present after the match.
The re.findall()
function is used to find all occurrences of the pattern in the sentence
string, and return a list of all matches. In this case, the output would be ['brown fox jumps over the lazy']
, which is the string between “quick” and “dog”.
Note that we use the r
prefix before the regular expression pattern to indicate that it is a raw string, which allows us to use backslashes without them being interpreted as escape characters.
2. To extract data from structured text data formats like CSV files
Here’s an example code snippet that shows how to extract data from a CSV file using regular expressions:
import re
# Define the regular expression pattern for a CSV file
pattern = r'((?:(?:"[^"]*")|[^,])*)'
# Open the CSV file and read its contents
with open('data.csv') as file:
contents = file.read()
# Use the re.findall() function to find all matches of the pattern in the CSV contents
matches = re.findall(pattern, contents)
# Print the matches to the console
for match in matches:
print(match)
In this example, we define a regular expression pattern that matches a CSV file. The pattern uses a non-capturing group (?:(?:"[^"]*")|[^,])*
to match each field in the CSV file. The group matches either a string enclosed in double quotes (i.e. "[^"]*"
), or any character that is not a comma (i.e. [^,]
). The *
quantifier at the end of the group matches zero or more occurrences of the group.
We then open the CSV file and read its contents, and use the re.findall()
function to find all occurrences of the pattern in the CSV contents. The findall()
function returns a list of all matches, with each match represented as a tuple of strings that correspond to the fields in the CSV file.
Finally, we loop over the matches and print them to the console. In this case, the output would be a list of tuples, where each tuple represents a row in the CSV file.
Further readings
If you are interested in learning more about regular expressions, here are some resources that can help you:
- Regular-Expressions.info: This website provides a comprehensive tutorial on regular expressions, including a quick-start guide, reference material, and examples in various programming languages.
- Mastering Regular Expressions: This book by Jeffrey Friedl is a comprehensive guide to regular expressions, covering everything from the basics to advanced topics like lookahead and back references.
- Python’s re module documentation: The official documentation for Python’s
re
module provides a detailed reference for all the regular expression functions and metacharacters available in Python. - Regular Expressions Cookbook: This book by Jan Goyvaerts and Steven Levithan provides practical solutions for common regular expression tasks, with examples in multiple programming languages.
- regex101.com: This website allows you to test regular expressions and see how they match against sample text. It supports multiple programming languages and provides detailed explanations of the regular expressions used.
- Regular Expressions 101: This website also allows you to test regular expressions, and provides a quick reference guide for the most commonly used metacharacters.
These resources can help you get started with regular expressions and improve your skills over time.