Working with Regular Expressions
A regular expression defines a pattern. That pattern is then compared to a target string, and based on the rules, it either:
- Matches the pattern (success),
- Or doesn’t match (failure).
- Validation: Email addresses, phone numbers, passwords
- Text extraction: Pulling data from unstructured text
- Search and Replace: Complex find/replace operations
- Text Processing: Cleaning, transforming, and parsing text
- Data Scraping: Extracting specific information from documents
| Character | Meaning |
|---|---|
. | Matches any character except newline |
\d | Matches any digit (0-9) |
\w | Matches any word character (alphanumeric + underscore) |
\s | Matches any whitespace character |
\D, \W, \S | Negated versions (non-digit, non-word, non-whitespace) |
| Symbol | Meaning |
|---|---|
* | 0 or more times |
+ | 1 or more times |
? | 0 or 1 time (optional) |
{n} | Exactly n times |
{n,} | n or more times |
{n,m} | Between n and m times |
- Square brackets
[ ]define a character class. - Match any single character from the specified set.
- Examples:
[aeiou]- matches any vowel[0-9]- matches any digit (same as\d)[a-zA-Z]- matches any letter (upper or lowercase)[^0-9]- matches any character that’s NOT a digit
| Pattern | Description |
|---|---|
(abc) | Capturing group |
(?:abc) | Non-capturing group |
[abc] | a, b, or c |
[^abc] | Not a, b, or c |
[a-z] | Range (lowercase) |
[A-Z] | Range (uppercase) |
/yes|no/js
Matches either "yes" or "no"
| Symbol | Meaning |
|---|---|
^ | Start of string or line |
$ | End of string or line |
\b | Word boundary |
\B | Not a word boundary |
| Flag | Description |
|---|---|
g | Global search (find all matches) |
i | Case-insensitive search |
m | Multi-line mode (^ and $ match line start/end) |
s | Allows . to match newline characters |
u | Unicode mode |
y | Sticky search (matches from lastIndex) |
Example:
const regex = /hello/gi;js
- Parentheses
( )create capture groups - Used to:
- Apply quantifiers to entire sequences
- Extract specific parts of the match
- Reference matched text with backreferences
# Example: Capturing name partspattern = r"(\w+)\s(\w+)"text = "John Smith"# Captures: Group 1 = "John", Group 2 = "Smith"plaintext
|pipe symbol for alternation (OR operator)(?:...)for non-capturing groups- Examples:
cat|dogmatches “cat” or “dog”I love (cats|dogs)matches “I love cats” or “I love dogs”(?:https?|ftp)://matches “http://”, “https://”, or “ftp://“
- Lookahead:
(?=...)positive,(?!...)negative - Lookbehind:
(?<=...)positive,(?<!...)negative - Zero-width assertions (don’t consume characters)
- Examples:
\w+(?=\s)- word followed by whitespace(?<=\$)\d+- digits preceded by dollar sign\b\w+\b(?!\s+and\b)- word NOT followed by “and”