智能助手
帮助中心智能助手上线
我能为你解答 Lark 使用的问题,快来问问我吧!
00:00
点击按住可拖动视频
我知道了
去试试

正则语法

本文阅读时长:2 分钟
💡
Lark 支持 RE2 语法,正则表达式默认区分大小写。
正则表达式示例
以下示例介绍了如何使用、构造简单的正则表达式。每个示例都包含要匹配的文本类型、用于匹配该文本的一个或几个正则表达式,以及一些注释(说明特殊字符和格式的使用方法)。
仅精确匹配短语
用法示例
匹配短语 stock tips
正则表达式示例
示例 1: (\W|^)stock\stips(\W|$)
示例 2:(\W|^)stock\s{0,3}tips(\W|$)
示例 3: (\W|^)stock\s{0,3}tip(s){0,1}(\W|$)
注释
  • \W 匹配字母、数字或下划线以外的任何字符。它不允许 regex 匹配短语前后的字符。
  • 在示例 2 中,\s 匹配空格字符,{0,3} 表示字词 stocktip 之间可以出现 0-3 个空格。
  • ^ 匹配新行的行首,允许 regex 匹配出现在行首的短语(该短语前无其他字符)。
  • $ 匹配行尾,允许正则表达式匹配出现在行尾的短语(该短语后无其他字符)。
  • 在示例 3 中,(s) 匹配字母 s,{0,1} 表示该字母可以在字词 tip 后出现 0 次或 1 次。所以,该正则表达式会匹配 stock tipstock tips。另外,您还可以使用字符 ? 来替代 {0,1}
匹配列表中的字词或短语
用法示例
匹配以下列表中的任意字词或短语:
  • baloney
  • darn
  • drat
  • fooey
  • gosh darnit
  • heck
正则表达式示例
(?i)(\W|^)(baloney|darn|drat|fooey|gosh\sdarnit|heck)(\W|$)
注释
  • (…) 会对所有字词分组,以便 \W 字符类可应用于括号内的所有字词。
  • (?i) 使匹配内容时不区分大小写。
  • \W 匹配字母、数字或下划线以外的任何字符。它不允许 regex 匹配列表中的字词或短语前后的字符。
  • ^ 匹配新行的行首,允许 regex 匹配出现在行首的字词(该字词前无其他字符)。
  • $ 匹配行尾,允许正则表达式匹配出现在行尾的字词(该字词后无其他字符)
  • | 代表“或”,所以,此正则表达式会匹配列表中的任意一个字词。
  • \s 匹配一个空格字符。使用此字符可分隔短语中的字词。
匹配包含不同拼写或特殊字符的字词
用法示例
匹配字词“viagra”和垃圾邮件发件人使用的某些混淆内容,例如:
  • vi@gra
  • v1agra
  • v1@gra
  • v!@gr@
正则表达式示例
v[i!1][a@]gr[a@]
注释
  • 未添加 \W,因此 viagra 的任何变体前后都可能出现其他字符。例如,该正则表达式仍匹配以下文本中的 viagra
viagra!! ***viagra***
  • [i!1] 匹配字词中位于第二个字符位置的字符 i!1
匹配某个特定网域的所有电子邮件地址
用法示例
匹配来自网域 yahoo.comhotmail.comgmail.com 的任何电子邮件地址。
正则表达式示例
(\W|^)[\w.\-]{0,25}@(yahoo|hotmail|gmail)\.com(\W|$)
注释
  • \W 匹配字母、数字或下划线以外的任何字符。它不允许 regex 匹配电子邮件地址前后的字符。
  • ^ 匹配新行的行首,允许 regex 匹配出现在行首的地址(该地址前无其他字符)。
  • $ 匹配行尾,允许正则表达式匹配出现在行尾的地址(该地址后无其他字符)。
  • [\w.\-] 匹配所有字词字符(a-z、A-Z、0-9 或下划线)、句号或连字符。这些是电子邮件地址的第一部分中所包含的最常用有效字符。\-(代表连字符)必须在方括号内字符列表的最后出现。
  • 短划线和句号前的 \ 用于“去除”这些字符,即表示短划线和句号本身不是正则表达式的特殊字符。方括号中不需要去除句号。
  • {0,25} 代表可以在 @ 符号之前出现的前字符集中的字符数,范围从 0-25。
  • (…) 格式对网域进行分组,分隔网域的 | 字符代表“或”。
匹配某个范围内的所有 IP 地址
用法示例
匹配范围 192.168.1.0192.168.1.255 之间的所有 IP 地址。
正则表达式示例
示例 1:192\.168\.1\.
示例 2:192\.168\.1\.\d{1,3}
注释
  • 每个句号前的 \ 用于“去除”句号,即表示句号本身不是正则表达式的特殊字符。
  • 在示例 1 中,最后一个句号之后没有其他字符,所以,正则表达式匹配以 192.168.1. 开头的所有 IP 地址,无论后面出现什么数字。
  • 在示例 2 中,\d 会匹配最后一个句号之后从 0 到 9 的所有数字,{1,3} 表示最后一个句号之后可以出现 1 - 3 位数。在这种情况下,正则表达式匹配以 192.168.1. 开头的所有完整 IP 地址。该正则表达式还会匹配无效的 IP 地址,如 192.168.1.999
匹配字母数字格式
用法示例
匹配公司的采购订单号。此编号可能有各种格式,例如:
  • PO nn-nnnnn
  • PO-nn-nnnn
  • PO# nn nnnn
  • PO#nn-nnnn
  • PO nnnnnn
正则表达式示例
(\W|^)po[#\-]{0,1}\s{0,1}\d{2}[\s-]{0,1}\d{4}(\W|$)
注释
  • \W 匹配字母、数字或下划线以外的任何字符。它不允许 regex 匹配编号前后的字符。
  • ^ 匹配新行的行首,允许 regex 匹配出现在行首的编号(该编号前无其他字符)。
  • $ 匹配行尾,允许正则表达式匹配出现在行尾的编号(该编号后无其他字符)。
  • [#\-] 匹配字母 po 后的井号或连字符,{0,1} 表示这些字符中每个字符可以出现 0 次或 1 次。\-(代表连字符)必须在方括号内字符列表的最后出现。
  • \s 匹配空格,{0,1} 表示空格可出现 0 次或 1 次。
  • \d 匹配从 0-9 的任意数字,{2} 表示在编号的这一位置必须正好出现 2 位数字。
语法附录
kinds of single-character expressions
examples
any character, possibly including newline (s=true)
.
character class
[xyz]
negated character class
[^xyz]
Perl character class
\d
negated Perl character class
\D
ASCII character class
[[:alpha:]]
negated ASCII character class
[[:^alpha:]]
Unicode character class (one-letter name)
\pN
Unicode character class
\p{Greek}
negated Unicode character class (one-letter name)
\PN
negated Unicode character class
\P{Greek}
Composites
xy
x followed by y
x|y
x or y (prefer x)
Repetitions
x*
zero or more x, prefer more
x+
one or more x, prefer more
x?
zero or one x, prefer one
x{n,m}
n or n+1 or ... or m x, prefer more
x{n,}
n or more x, prefer more
x{n}
exactly n x
x*?
zero or more x, prefer fewer
x+?
one or more x, prefer fewer
x??
zero or one x, prefer zero
x{n,m}?
n or n+1 or ... or m x, prefer fewer
x{n,}?
n or more x, prefer fewer
x{n}?
exactly n x
x{}
(≡ x*) (NOT SUPPORTED) VIM
x{-}
(≡ x*?) (NOT SUPPORTED) VIM
x{-n}
(≡ x{n}?) (NOT SUPPORTED) VIM
x=
(≡ x?) (NOT SUPPORTED) VIM
Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.
Possessive repetitions
x*+
zero or more x, possessive (NOT SUPPORTED)
x++
one or more x, possessive (NOT SUPPORTED)
x?+
zero or one x, possessive (NOT SUPPORTED)
x{n,m}+
n or ... or m x, possessive (NOT SUPPORTED)
x{n,}+
n or more x, possessive (NOT SUPPORTED)
x{n}+
exactly n x, possessive (NOT SUPPORTED)
Flags
i
case-insensitive (default false)
m
multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s
let . match \n (default false)
U
ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)
Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z).
Grouping
(re)
numbered capturing group (submatch)
(?P<name>re)
named & numbered capturing group (submatch)
(?<name>re)
named & numbered capturing group (submatch) (NOT SUPPORTED)
(?'name're)
named & numbered capturing group (submatch) (NOT SUPPORTED)
(?:re)
non-capturing group
(?flags)
set flags within current group; non-capturing
(?flags:re)
set flags during re; non-capturing
(?#text)
comment (NOT SUPPORTED)
(?|x|y|z)
branch numbering reset (NOT SUPPORTED)
(?>re)
possessive match of re (NOT SUPPORTED)
re@>
possessive match of re (NOT SUPPORTED) VIM
%(re)
non-capturing group (NOT SUPPORTED) VIM
Empty strings
^
at beginning of text or line (m=true)
$
at end of text (like \z not \Z) or line (m=true)
\A
at beginning of text
\b
at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B
not at ASCII word boundary
\g
at beginning of subtext being searched (NOT SUPPORTED) PCRE
\G
at end of last match (NOT SUPPORTED) PERL
\Z
at end of text, or before newline at end of text (NOT SUPPORTED)
\z
at end of text
(?=re)
before text matching re (NOT SUPPORTED)
(?!re)
before text not matching re (NOT SUPPORTED)
(?<=re)
after text matching re (NOT SUPPORTED)
(?<!re)
after text not matching re (NOT SUPPORTED)
re&
before text matching re (NOT SUPPORTED) VIM
re@=
before text matching re (NOT SUPPORTED) VIM
re@!
before text not matching re (NOT SUPPORTED) VIM
re@<=
after text matching re (NOT SUPPORTED) VIM
re@<!
after text not matching re (NOT SUPPORTED) VIM
\zs
sets start of match (= \K) (NOT SUPPORTED) VIM
\ze
sets end of match (NOT SUPPORTED) VIM
\%^
beginning of file (NOT SUPPORTED) VIM
\%$
end of file (NOT SUPPORTED) VIM
\%V
on screen (NOT SUPPORTED) VIM
\%#
cursor position (NOT SUPPORTED) VIM
\%'m
mark m position (NOT SUPPORTED) VIM
\%23l
in line 23 (NOT SUPPORTED) VIM
\%23c
in column 23 (NOT SUPPORTED) VIM
\%23v
in virtual column 23 (NOT SUPPORTED) VIM
Escape sequences
\a
bell (≡ \007)
\f
form feed (≡ \014)
\t
horizontal tab (≡ \011)
\n
newline (≡ \012)
\r
carriage return (≡ \015)
\v
vertical tab character (≡ \013)
\*
literal *, for any punctuation character *
\123
octal character code (up to three digits)
\x7F
hex character code (exactly two digits)
\x{10FFFF}
hex character code
\C
match a single byte even in UTF-8 mode
\Q...\E
literal text ... even if ... has punctuation
\1
backreference (NOT SUPPORTED)
\b
backspace (NOT SUPPORTED) (use \010)
\cK
control char ^K (NOT SUPPORTED) (use \001 etc)
\e
escape (NOT SUPPORTED) (use \033)
\g1
backreference (NOT SUPPORTED)
\g{1}
backreference (NOT SUPPORTED)
\g{+1}
backreference (NOT SUPPORTED)
\g{-1}
backreference (NOT SUPPORTED)
\g{name}
named backreference (NOT SUPPORTED)
\g<name>
subroutine call (NOT SUPPORTED)
\g'name'
subroutine call (NOT SUPPORTED)
\k<name>
named backreference (NOT SUPPORTED)
\k'name'
named backreference (NOT SUPPORTED)
\lX
lowercase X (NOT SUPPORTED)
\ux
uppercase x (NOT SUPPORTED)
\L...\E
lowercase text ... (NOT SUPPORTED)
\K
reset beginning of $0 (NOT SUPPORTED)
\N{name}
named Unicode character (NOT SUPPORTED)
\R
line break (NOT SUPPORTED)
\U...\E
upper case text ... (NOT SUPPORTED)
\X
extended Unicode sequence (NOT SUPPORTED)
\%d123
decimal character 123 (NOT SUPPORTED) VIM
\%xFF
hex character FF (NOT SUPPORTED) VIM
\%o123
octal character 123 (NOT SUPPORTED) VIM
\%u1234
Unicode character 0x1234 (NOT SUPPORTED) VIM
\%U12345678
Unicode character 0x12345678 (NOT SUPPORTED) VIM
Character class elements
x
single character
A-Z
character range (inclusive)
\d
Perl character class
[:foo:]
ASCII character class foo
\p{Foo}
Unicode character class Foo
\pF
Unicode character class F (one-letter name)
Named character classes as character class elements
[\d]
digits (≡ \d)
[^\d]
not digits (≡ \D)
[\D]
not digits (≡ \D)
[^\D]
not not digits (≡ \d)
[[:name:]]
named ASCII class inside character class (≡ [:name:])
[^[:name:]]
named ASCII class inside negated character class (≡ [:^name:])
[\p{Name}]
named Unicode property inside character class (≡ \p{Name})
[^\p{Name}]
named Unicode property inside negated character class (≡ \P{Name})
作者Lark 帮助中心
最后更新于2022-12-13
评价此内容
提交成功,感谢你的反馈!
未能解决你的问题?请联系在线客服
0
rangeDom