Skip to content
GitHub

Working with Regular Expressions

Switch to Zen Mode

A regular expression defines a pattern. That pattern is then compared to a target string, and based on the rules, it either:

  • Matches the pattern (success),
  • Or doesn’t match (failure).

  • Validation: Email addresses, phone numbers, passwords
  • Text extraction: Pulling data from unstructured text
  • Search and Replace: Complex find/replace operations
  • Text Processing: Cleaning, transforming, and parsing text
  • Data Scraping: Extracting specific information from documents

CharacterMeaning
.Matches any character except newline
\dMatches any digit (0-9)
\wMatches any word character (alphanumeric + underscore)
\sMatches any whitespace character
\D, \W, \SNegated versions (non-digit, non-word, non-whitespace)

SymbolMeaning
*0 or more times
+1 or more times
?0 or 1 time (optional)
{n}Exactly n times
{n,}n or more times
{n,m}Between n and m times

  • Square brackets [ ] define a character class.
  • Match any single character from the specified set.
  • Examples:
    • [aeiou] - matches any vowel
    • [0-9] - matches any digit (same as \d)
    • [a-zA-Z] - matches any letter (upper or lowercase)
    • [^0-9] - matches any character that’s NOT a digit

PatternDescription
(abc)Capturing group
(?:abc)Non-capturing group
[abc]a, b, or c
[^abc]Not a, b, or c
[a-z]Range (lowercase)
[A-Z]Range (uppercase)
/yes|no/
js

Matches either "yes" or "no"


SymbolMeaning
^Start of string or line
$End of string or line
\bWord boundary
\BNot a word boundary

FlagDescription
gGlobal search (find all matches)
iCase-insensitive search
mMulti-line mode (^ and $ match line start/end)
sAllows . to match newline characters
uUnicode mode
ySticky search (matches from lastIndex)

Example:

const regex = /hello/gi;
js
  • Parentheses ( ) create capture groups
  • Used to:
    • Apply quantifiers to entire sequences
    • Extract specific parts of the match
    • Reference matched text with backreferences
# Example: Capturing name parts
pattern = r"(\w+)\s(\w+)"
text = "John Smith"
# Captures: Group 1 = "John", Group 2 = "Smith"
plaintext
  • | pipe symbol for alternation (OR operator)
  • (?:...) for non-capturing groups
  • Examples:
    • cat|dog matches “cat” or “dog”
    • I love (cats|dogs) matches “I love cats” or “I love dogs”
    • (?:https?|ftp):// matches “http://”, “https://”, or “ftp://“
  • Lookahead: (?=...) positive, (?!...) negative
  • Lookbehind: (?<=...) positive, (?<!...) negative
  • Zero-width assertions (don’t consume characters)
  • Examples:
    • \w+(?=\s) - word followed by whitespace
    • (?<=\$)\d+ - digits preceded by dollar sign
    • \b\w+\b(?!\s+and\b) - word NOT followed by “and”