Regular Expressions Cheatsheet

Anchors

^	Start of string / line
$	End of string / line
\b	Word boundary
\B	Not a word boundary
\A	Start of string (no multiline)
\Z	End of string (no multiline)
\z	Absolute end of string
\G	Start of current search (pos)
\K	Reset match start (keep left)
(?m)^	Start of each line (multiline)
(?m)$	End of each line (multiline)

Character Classes

.	Any char except newline
\w	Word char [a-zA-Z0-9_]
\W	Non-word character
\d	Digit [0-9]
\D	Non-digit
\s	Whitespace [ \t\n\r\f]
\S	Non-whitespace
[abc]	Any of a, b, or c
[^abc]	Not a, b, or c
[a-z]	Range a to z
[a-zA-Z0-9]	Alphanumeric
\p{L}	Unicode letter (PCRE/Java)
\p{N}	Unicode number

Quantifiers

*	0 or more (greedy)
+	1 or more (greedy)
?	0 or 1 (greedy)
{n}	Exactly n times
{n,}	n or more times
{n,m}	Between n and m times
*?	0 or more (lazy)
+?	1 or more (lazy)
??	0 or 1 (lazy)
*+	0 or more (possessive)
++	1 or more (possessive)

Groups

(abc)	Capture group
(?:abc)	Non-capturing group
(?<name>abc)	Named capture group
(?P<name>abc)	Named group (Python)
\1	Backreference to group 1
(?P=name)	Named backreference (Python)
(?>abc)	Atomic group (no backtrack)
a\|b	Alternation: a or b

Lookarounds

(?=abc)	Positive lookahead
(?!abc)	Negative lookahead
(?<=abc)	Positive lookbehind
(?<!abc)	Negative lookbehind

Lookahead example

\d+(?=px) → matches digits before "px"

Lookbehind example

(?<=\$)\d+ → matches digits after "$"

Flags / Modifiers

i	Case insensitive
g	Global — find all matches
m	Multiline — ^ and $ per line
s	Dotall — . matches \n too
x	Verbose — ignore whitespace
u	Unicode mode
(?ims)	Multiple inline flags
(?-i)	Turn off flag inline

Escape Sequences

\n	Newline (LF, U+000A)
\r	Carriage return (CR, U+000D)
\t	Horizontal tab (U+0009)
\v	Vertical tab (U+000B)
\f	Form feed (U+000C)
\a	Bell (U+0007)
\e	Escape (U+001B)
\0	Null character (U+0000)
\xFF	Hex byte (e.g. \x41 = A)
\uFFFF	Unicode code point (4 hex)
\u{1F600}	Unicode (JS with /u flag)
\cX	Control char (e.g. \cM = CR)
\\	Literal backslash
\. \* \+ \? \^ \$	Literal meta characters
\[ \] \{ \}	Literal brackets
\\| \/	Literal pipe / slash

Replacement / Substitution

$1 or \1	Insert capture group 1
${1}	Group 1 (unambiguous)
$&	Entire match
$`	String before match
$'	String after match
$$	Literal $ in replacement
$<name>	Named group (JS)
\g<name>	Named group (Python)
\g<1>	Group by number (Python)
${name}	Named group (.NET / JS)
\U\1	Uppercase group (Perl/PCRE)
\L\1	Lowercase group (Perl/PCRE)
\E	End \U or \L

Useful Links

regex101.com
Online tester with explanation (PCRE, JS, Python, Go)

regexr.com
Visual regex tester with community patterns

regexcrossword.com
Learn regex by solving crossword puzzles

regular-expressions.info
Deep dive reference for all flavors

MDN Regex Guide
JavaScript regex documentation

Python re docs
Official Python regex module reference

Go regexp syntax
Go RE2 syntax reference

PCRE2 syntax
Full PCRE2 pattern syntax

mqtt-regex
Converts MQTT topic patterns into RegExp objects (JS)

HiveMQ MQTT Topics
MQTT topic syntax, wildcards + and # explained

Common Patterns

[\w.+-]+@[\w-]+\.[a-z]{2,}	Email
https?://[\w./%-]+(\?[\w=&%-]+)?	URL
\d{1,3}(\.\d{1,3}){3}	IPv4 address
([0-9a-f]{2}:){5}[0-9a-f]{2}	MAC address
^\+?[\d\s()\-]{7,15}$	Phone number
^#?[0-9a-fA-F]{6}$	Hex color (#RRGGBB)
^#?([0-9a-fA-F]{3}){1,2}$	Hex color (3 or 6 digits)
\d{4}-\d{2}-\d{2}	Date YYYY-MM-DD
\d{2}[\/.-]\d{2}[\/.-]\d{4}	Date DD/MM/YYYY
\d{2}:\d{2}(:\d{2})?	Time HH:MM[:SS]

^(?=.\d)(?=.[a-z])(?=.*[A-Z]).{8,}$	Strong password
^[a-zA-Z0-9_]{3,20}$	Username
^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$	Credit card number
^[a-z][a-z0-9\-]{1,61}[a-z0-9]$	Domain name
[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}	UUID / GUID
<[^>]+>	HTML tag (basic)
\/\[\s\S]?\*\/	/* Block comment */
^\s+\|\s+$	Leading/trailing whitespace
\b\d+(\.\d+)?\b	Integer or float
^[01]?\d\|2[0-3]):[0-5]\d$	24-hour time HH:MM

JavaScript

/pattern/flags	Regex literal
new RegExp(str, flags)	Dynamic regex
re.test(str)	Returns true/false
str.match(re)	Array of matches or null
str.matchAll(re)	Iterator of all matches
str.search(re)	Index of first match or -1
str.replace(re, '$1')	Replace with string or fn
str.replaceAll(re, s)	Replace all (needs g flag)
str.split(re)	Split by pattern

Python (re module)

re.match(p, s)	Match at start of string
re.search(p, s)	Search anywhere in string
re.findall(p, s)	List of all matches
re.finditer(p, s)	Iterator of match objects
re.sub(p, repl, s)	Replace matches
re.subn(p, repl, s)	Replace + count
re.split(p, s)	Split by pattern
re.compile(p, re.I)	Compile with flags
m.group(1)	Get capture group value
m.groupdict()	Named groups as dict
m.span()	Start and end positions

POSIX Classes

[:alpha:]	Letters [a-zA-Z]
[:digit:]	Digits [0-9]
[:alnum:]	Letters and digits
[:space:]	Whitespace
[:upper:]	Uppercase letters
[:lower:]	Lowercase letters
[:punct:]	Punctuation
[:print:]	Printable characters
[:xdigit:]	Hex digits [0-9a-fA-F]

MQTT & IoT

Topic validation

^(?:[^/+#\x00-\x1f\x7f]+)(?:\/(?:[^/+#\x00-\x1f\x7f]))$	Publish topic — no wildcards, no control chars; from MQTT spec 4.7 + Node-RED isValidPublishTopic()
^(?:[^+#\x00-\x1f\x7f]\|\+)(?:\/(?:[^+#\x00-\x1f\x7f]\|\+))*(?:\/(#))?$	Subscribe topic filter — + single-level and # multi-level (end only); from MQTT spec + mqtt-regex
(?:^\|\/)\\+(?:\/\|$)	Single-level wildcard + — finds + tokens inside a topic filter (mqtt-regex process_single)
(?:^\|\/)#$	Multi-level wildcard # — valid only at end of topic filter (mqtt-regex process_multi)
(?:^\|\/)(\+\|#\|[^/+#\x00-\x1f\x7f]*)(?=\/\|$)	Tokenize topic — splits into level segments, wildcard-aware (mqtt-regex tokenize)

Topic patterns by platform

^(?:myhome\|home)\/(?:[a-z0-9]+)\/(?:[a-z0-9]+)\/(?:temperature\|humidity\|pressure\|motion\|light\|power)$	Home sensor topic — home/<floor>/<room>/<measurement>; from HiveMQ MQTT Essentials Part 5
^homeassistant\/([a-z_]+)\/([a-zA-Z0-9_-]+)(?:\/([a-zA-Z0-9_-]+))?\/(?:config\|state\|set\|status\|availability)$	Home Assistant MQTT Discovery — homeassistant/<component>/[node_id/]<object_id>/<suffix>
^zigbee2mqtt\/(?:bridge\/(?:state\|config\|devices\|groups\|info\|logging\|request\/.+\|response\/.+)\|([^/]+)\/(?:set\|get\|availability)?)$	Zigbee2MQTT topic — bridge control or <device_id>/set\|get\|availability (from mqtt.ts)
^\$SYS\/broker\/(?:clients\/(?:connected\|disconnected\|maximum\|total)\|messages\/(?:sent\|received\|stored)\|uptime\|version\|load\/\w+)$	Mosquitto $SYS — broker stats: clients, messages, uptime, version, load
^\$aws\/things\/[^/]+\/shadow\/(?:get\|update\|delete)$	AWS IoT Core device shadow — get, update or delete operations
^devices\/([^\/]+)\/messages\/(?:events\|devicebound)	Azure IoT Hub — device-to-cloud events and cloud-to-device messages
^[A-Z]{2,6}\/[A-Z0-9_]+\/[A-Z0-9_]+$	Industrial SCADA topic — AREA/DEVICE/PARAM naming used in factory automation

Mosquitto log parsing

^(\d{10}):\s+(.+)$	Mosquitto log line — splits Unix epoch timestamp and message body (default, no log_timestamp_format)
^(\d{10}\|(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})):\s+New (?:bridge )?client connected from ([\d.]+):(\d+) as (\S+) $p(\d+), c([01]), k(\d+)(?:, u'([^']*)')?$	Mosquitto CONNECT — captures ts, IP, port, clientId, protocol, clean-session, keepalive, username
Received PUBLISH from (\S+) $d(\d), q(\d), r(\d), m(\d+), '([^']+)', \.\.\. \((\d+) bytes$\)	Mosquitto PUBLISH debug — clientId, dup, QoS, retain, msgId, topic, bytes (handle_publish.c)
^(\d{10}\|(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})):\s+Client (\S+) disconnected\.	Mosquitto DISCONNECT — captures timestamp and clientId
\bas\s+([a-zA-Z0-9_:\-\.]+)\s+\(	Extract clientId from Mosquitto CONNECT log — 'as <clientId> (' format (handle_connect.c)
\bq(?:os)?[=:\s]?([0-2])\b	QoS level from Mosquitto debug — matches (d0, q1, r0) or subscribe log entries

Payload & device identifiers

\{[\s\S]*\}	JSON payload — extract inline JSON from log output or MQTT debug capture
"(?:temperature\|temp\|t)"\s:\s(-?\d+(?:\.\d+)?)\|"(?:humidity\|hum\|h)"\s:\s(\d+(?:\.\d+)?)	Temp / humidity — extract numeric values from JSON sensor payload (HA docs format)
^(ON\|OFF\|true\|false\|1\|0\|OPEN\|CLOSED\|LOCK\|UNLOCK\|ONLINE\|OFFLINE)$	Boolean / state payload — switch, lock, cover, availability (Home Assistant MQTT)
"(?:lat(?:itude)?\|lon(?:gitude)?\|lng\|alt(?:itude)?)"\s:\s(-?\d+\.\d+)	GPS payload — extracts lat/lon/altitude from JSON tracker topic (HiveMQ asset tracking example)
"battery"\s:\s(\d{1,3})	Battery level — extract % from JSON, useful for low-charge alerting
(?:[0-9A-Fa-f]{2}[:\-]){5}[0-9A-Fa-f]{2}	MAC address — device identifier in topic paths and Zigbee2MQTT payloads
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}	Device UUID — cloud IoT platform identifier embedded in topic or payload
\besp(?:_\|-)?[a-z0-9_-]{4,32}\b	ESP8266/ESP32 client ID — matches esp_device_01, esp-abc123 in Mosquitto CONNECT logs

Timestamps

^(\d{10})(?=:\s)	Unix epoch (seconds) at start of Mosquitto log line — 10 digits before ': ' (logging.c)
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z\|[+-]\d{2}:\d{2})?	ISO 8601 timestamp — when Mosquitto log_timestamp_format = %Y-%m-%dT%H:%M:%S is set

Topic wildcard examples

home/+/temperature → matches home/living/temperature, home/bedroom/temperature sensors/# → matches sensors/any/depth/path

Mosquitto log example (default format)

1711234567: New client connected from 192.168.1.42:49823 as esp_sensor_01 (p2, c1, k60, u'homelab')

JSON sensor payload (Home Assistant)

      {"temperature":21.4,"humidity":58,"battery":87,"status":"online"}