AI Assistant
Help Center AI Assistant is now available
Got questions about Lark? Use our AI chat to find the answers.
00:00
Click and hold to drag
Got It
Try Now

Regex

1 min read
💡
Lark supports RE2 syntax, and regular expressions are case-sensitive by default.
Regex examples
The following examples illustrate how to use and construct simple regular expressions. Each example includes the type of text to match, the regular expression(s) used to match that particular text, and comments explaining how to use special characters and formatting.
1. Match exact phrase only
Example scenario
Match the phrase "stock tips"
Regex examples
Example 1: (\W|^)stock\stips(\W|$)
Example 2: (\W|^)stock\s{0,3}tips(\W|$)
Example 3: (\W|^)stock\s{0,3}tip(s){0,1}(\W|$)
Notes
  • \W matches any character that isn't a letter, number, or underscore. It stops regex from matching characters before and after the phrase.
  • In example 2, \s matches blank spaces, and {0,3} indicates that there can be 0 to 3 blank spaces in between the words stock and tip.
  • ^ matches the beginning of a new line and allows regex to match phrases that appear at the start of a line (when there are no other characters preceding the phrase).
  • $ matches the end of a line and allows regex to match phrases that appear at the end of a line (when there are no other characters after the phrase).
  • In example 3, (s) matches the letter "s," and {0,1} indicates that this particular letter can appear at the end of the word "tip" 0 or 1 times. This means regex will match both stock tip and stock tips. You can also use ? instead of {0,1}.
2. Match words or phrases in a list
Example scenario
Match any of the words or phrases from the list below:
  • baloney
  • darn
  • drat
  • fooey
  • gosh darnit
  • heck
Regex examples
(?i)(\W|^)(baloney|darn|drat|fooey|gosh\sdarnit|heck)(\W|$)
Notes
  • (…) groups words together so that the \W character class can be applied to all of them.
  • (?i) will ignore character case in the matched content.
  • \W matches any character that isn't a letter, number, or underscore. It stops regex from matching characters before or after any of the words/phrases in the list.
  • ^ matches the beginning of a new line and allows regex to match words that appear at the start of a line (when there are no other characters preceding the word).
  • $ matches the end of a line and allows regex to match words that appear at the end of a line (when there are no other characters after the word).
  • | represents "or," allowing this regex to match with any of the words in the list.
  • \s matches a blank space. Use this character to separate words in a phrase.
3. Match words with different spellings or special characters
Example scenario
Match the word "diagram" when spelled using special characters, such as:
  • di@gram
  • d1agram
  • d1@gram
  • d!@gr@m
Regex examples
d[i!1][a@]gr[a@]m
Notes
  • The \W is not used here, so any characters may appear before or after any variant on the word diagram. For example, regex will still match diagram in the following text:
diagram!! or ***diagram***
  • [i!1] matches the characters i, !, or 1 in the second character position of the word.
4. Match all email addresses for a specific domain
Example scenario
Match any email address with a yahoo.com, hotmail.com, and gmail.com domain.
Regex examples
(\W|^)[\w.\-]{0,25}@(yahoo|hotmail|gmail)\.com(\W|$)
Notes
  • \W matches any character that isn't a letter, number, or underscore. This stops regex from matching any characters before or after the email address.
  • ^ matches the beginning of a new line and allows regex to match addresses that appear at the start of a line (when there are no other characters preceding the address).
  • $ matches the end of a line and allows regex to match addresses that appear at the end of a line (when there are no other characters after the address).
  • [\w.\-] matches all word characters (a–z, A–Z, 0–9, or underscores), periods, or hyphens. These are the most commonly used valid characters in the first part of an email address. The \ - (which represents a hyphen) must appear last in the list of characters in square brackets.
  • A \ before a dash and period is used to "remove" these characters, indicating that dashes and periods are not classed as special characters in the regular expression. Periods do not need to be removed from inside square brackets.
  • {0,25} represents how many characters can appear before the @ symbol, i.e., 0–25.
  • (…) groups domains together, with the | symbol separating the domains representing "or."
5. Match all IP addresses in a range
Example scenario
Match all IP addresses within the range 192.168.1.0 to 192.168.1.255.
Regex examples
Example 1: 192\.168\.1\.
Example 2: 192\.168\.1\.\d{1,3}
Notes
  • The \ before each period is used to "remove" these periods, indicating that the period itself is not classed as a special character in the regular expression.
  • In example 1, there are no other characters after the final period, so the regular expression will match all IP addresses starting with 192.168.1. regardless of what follows afterward.
  • In example 2, \d matches all numbers from 0 to 9 following the final period, and {1,3} specifies 1–3 numbers after the final period. In this case, the regex will match all complete IP addresses starting with 192.168.1. This regex will also match invalid IP addresses, such as 192.168.1.999.
6. Match alphanumeric formats
Example scenario
Match a company's purchase order number. This number may have various formats, such as:
  • PO nn-nnnnn
  • PO-nn-nnnn
  • PO# nn nnnn
  • PO#nn-nnnn
  • PO nnnnnn
Regex examples
(\W|^)po[#\-]{0,1}\s{0,1}\d{2}[\s-]{0,1}\d{4}(\W|$)
Notes
  • \W matches any character that isn't a letter, number, or underscore. It stops regex from matching characters before and after the number.
  • ^ matches the beginning of a new line and allows regex to match numbers that appear at the start of a line (when there are no other characters preceding the number).
  • $ matches the end of a line and allows regex to match numbers that appear at the end of a line (when there are no other characters after the number).
  • [#\-] matches a pound sign or dash after the letters "PO," and {0,1} indicates that each of these characters may appear 0 or 1 times. The \ - (which represents a hyphen) must appear last in the list of characters in square brackets.
  • \s matches blank spaces, and {0,1} indicates that a space can occur 0 or 1 times.
  • \d matches any number from 0–9, and {2} indicates that exactly 2 numbers must appear at this location in the PO number.
Expressions appendix
kinds of single-character expressions
examples
any character, possibly including newline (s=true)
.
character class
[xyz]
negated character class
[^xyz]
Perl character class
\d
negated Perl character class
\D
ASCII character class
[[:alpha:]]
negated ASCII character class
[[:^alpha:]]
Unicode character class (one-letter name)
\pN
Unicode character class
\p{Greek}
negated Unicode character class (one-letter name)
\PN
negated Unicode character class
\P{Greek}
Composites
xy
x followed by y
x|y
x or y (prefer x)
Repetitions
x*
zero or more x, prefer more
x+
one or more x, prefer more
x?
zero or one x, prefer one
x{n,m}
n or n+1 or ... or m x, prefer more
x{n,}
n or more x, prefer more
x{n}
exactly n x
x*?
zero or more x, prefer fewer
x+?
one or more x, prefer fewer
x??
zero or one x, prefer zero
x{n,m}?
n or n+1 or ... or m x, prefer fewer
x{n,}?
n or more x, prefer fewer
x{n}?
exactly n x
x{}
(≡ x*) (NOT SUPPORTED) VIM
x{-}
(≡ x*?) (NOT SUPPORTED) VIM
x{-n}
(≡ x{n}?) (NOT SUPPORTED) VIM
x=
(≡ x?) (NOT SUPPORTED) VIM
Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.
Possessive repetitions
x*+
zero or more x, possessive (NOT SUPPORTED)
x++
one or more x, possessive (NOT SUPPORTED)
x?+
zero or one x, possessive (NOT SUPPORTED)
x{n,m}+
n or ... or m x, possessive (NOT SUPPORTED)
x{n,}+
n or more x, possessive (NOT SUPPORTED)
x{n}+
exactly n x, possessive (NOT SUPPORTED)
Flags
i
case-insensitive (default false)
m
multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s
let . match \n (default false)
U
ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)
Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z).
Grouping
(re)
numbered capturing group (submatch)
(?P<name>re)
named & numbered capturing group (submatch)
(?<name>re)
named & numbered capturing group (submatch) (NOT SUPPORTED)
(?'name're)
named & numbered capturing group (submatch) (NOT SUPPORTED)
(?:re)
non-capturing group
(?flags)
set flags within current group; non-capturing
(?flags:re)
set flags during re; non-capturing
(?#text)
comment (NOT SUPPORTED)
(?|x|y|z)
branch numbering reset (NOT SUPPORTED)
(?>re)
possessive match of re (NOT SUPPORTED)
re@>
possessive match of re (NOT SUPPORTED) VIM
%(re)
non-capturing group (NOT SUPPORTED) VIM
Empty strings
^
at beginning of text or line (m=true)
$
at end of text (like \z not \Z) or line (m=true)
\A
at beginning of text
\b
at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B
not at ASCII word boundary
\g
at beginning of subtext being searched (NOT SUPPORTED) PCRE
\G
at end of last match (NOT SUPPORTED) PERL
\Z
at end of text, or before newline at end of text (NOT SUPPORTED)
\z
at end of text
(?=re)
before text matching re (NOT SUPPORTED)
(?!re)
before text not matching re (NOT SUPPORTED)
(?<=re)
after text matching re (NOT SUPPORTED)
(?<!re)
after text not matching re (NOT SUPPORTED)
re&
before text matching re (NOT SUPPORTED) VIM
re@=
before text matching re (NOT SUPPORTED) VIM
re@!
before text not matching re (NOT SUPPORTED) VIM
re@<=
after text matching re (NOT SUPPORTED) VIM
re@<!
after text not matching re (NOT SUPPORTED) VIM
\zs
sets start of match (= \K) (NOT SUPPORTED) VIM
\ze
sets end of match (NOT SUPPORTED) VIM
\%^
beginning of file (NOT SUPPORTED) VIM
\%$
end of file (NOT SUPPORTED) VIM
\%V
on screen (NOT SUPPORTED) VIM
\%#
cursor position (NOT SUPPORTED) VIM
\%'m
mark m position (NOT SUPPORTED) VIM
\%23l
in line 23 (NOT SUPPORTED) VIM
\%23c
in column 23 (NOT SUPPORTED) VIM
\%23v
in virtual column 23 (NOT SUPPORTED) VIM
Escape sequences
\a
bell (≡ \007)
\f
form feed (≡ \014)
\t
horizontal tab (≡ \011)
\n
newline (≡ \012)
\r
carriage return (≡ \015)
\v
vertical tab character (≡ \013)
\*
literal *, for any punctuation character *
\123
octal character code (up to three digits)
\x7F
hex character code (exactly two digits)
\x{10FFFF}
hex character code
\C
match a single byte even in UTF-8 mode
\Q...\E
literal text ... even if ... has punctuation
\1
backreference (NOT SUPPORTED)
\b
backspace (NOT SUPPORTED) (use \010)
\cK
control char ^K (NOT SUPPORTED) (use \001 etc)
\e
escape (NOT SUPPORTED) (use \033)
\g1
backreference (NOT SUPPORTED)
\g{1}
backreference (NOT SUPPORTED)
\g{+1}
backreference (NOT SUPPORTED)
\g{-1}
backreference (NOT SUPPORTED)
\g{name}
named backreference (NOT SUPPORTED)
\g<name>
subroutine call (NOT SUPPORTED)
\g'name'
subroutine call (NOT SUPPORTED)
\k<name>
named backreference (NOT SUPPORTED)
\k'name'
named backreference (NOT SUPPORTED)
\lX
lowercase X (NOT SUPPORTED)
\ux
uppercase x (NOT SUPPORTED)
\L...\E
lowercase text ... (NOT SUPPORTED)
\K
reset beginning of $0 (NOT SUPPORTED)
\N{name}
named Unicode character (NOT SUPPORTED)
\R
line break (NOT SUPPORTED)
\U...\E
upper case text ... (NOT SUPPORTED)
\X
extended Unicode sequence (NOT SUPPORTED)
\%d123
decimal character 123 (NOT SUPPORTED) VIM
\%xFF
hex character FF (NOT SUPPORTED) VIM
\%o123
octal character 123 (NOT SUPPORTED) VIM
\%u1234
Unicode character 0x1234 (NOT SUPPORTED) VIM
\%U12345678
Unicode character 0x12345678 (NOT SUPPORTED) VIM
Character class elements
x
single character
A-Z
character range (inclusive)
\d
Perl character class
[:foo:]
ASCII character class foo
\p{Foo}
Unicode character class Foo
\pF
Unicode character class F (one-letter name)
Named character classes as character class elements
[\d]
digits (≡ \d)
[^\d]
not digits (≡ \D)
[\D]
not digits (≡ \D)
[^\D]
not not digits (≡ \d)
[[:name:]]
named ASCII class inside character class (≡ [:name:])
[^[:name:]]
named ASCII class inside negated character class (≡ [:^name:])
[\p{Name}]
named Unicode property inside character class (≡ \p{Name})
[^\p{Name}]
named Unicode property inside negated character class (≡ \P{Name})
Written by: Lark Help Center
Updated on 2023-01-18
How satisfied are you with this content?
Thank you for your feedback!
Need more help? Please contact Support.
0
rangeDom