Lark は RE2 構文をサポートしています。また、正規表現ではデフォルトで大文字と小文字が区別されます。
正規表現の使用例
この記事では、Lark の基本的な正規表現の使用方法と作成方法について説明しています。各使用例には、検索対象となるテキストの種類、テキストを検索するための 1 つまたは複数の正規表現、およびいくつかの説明(構文や特殊文字の使い方)が含まれています。
- 完全一致のフレーズを検索する
- リスト内のフレーズまたは単語を検索する
- スペルが近似する、または特殊文字を含む単語を検索する
- 特定のドメインのすべてのメールアドレスを検索する
- 特定範囲内のすべての IP アドレスを検索する
- 英字と数字を組み合わせたパターンを検索する
付録:正規表現の構文
kinds of single-character expressions | examples |
any character, possibly including newline (s=true) | . |
character class | [xyz] |
negated character class | [^xyz] |
Perl character class | \d |
negated Perl character class | \D |
ASCII character class | [[:alpha:]] |
negated ASCII character class | [[:^alpha:]] |
Unicode character class (one-letter name) | \pN |
Unicode character class | \p{Greek} |
negated Unicode character class (one-letter name) | \PN |
negated Unicode character class | \P{Greek} |
| Composites |
xy | x followed by y |
x|y | x or y (prefer x) |
| Repetitions |
x* | zero or more x, prefer more |
x+ | one or more x, prefer more |
x? | zero or one x, prefer one |
x{n,m} | n or n+1 or ... or m x, prefer more |
x{n,} | n or more x, prefer more |
x{n} | exactly n x |
x*? | zero or more x, prefer fewer |
x+? | one or more x, prefer fewer |
x?? | zero or one x, prefer zero |
x{n,m}? | n or n+1 or ... or m x, prefer fewer |
x{n,}? | n or more x, prefer fewer |
x{n}? | exactly n x |
x{} | (≡ x*) (NOT SUPPORTED) VIM |
x{-} | (≡ x*?) (NOT SUPPORTED) VIM |
x{-n} | (≡ x{n}?) (NOT SUPPORTED) VIM |
x= | (≡ x?) (NOT SUPPORTED) VIM |
Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.
| Possessive repetitions |
x*+ | zero or more x, possessive (NOT SUPPORTED) |
x++ | one or more x, possessive (NOT SUPPORTED) |
x?+ | zero or one x, possessive (NOT SUPPORTED) |
x{n,m}+ | n or ... or m x, possessive (NOT SUPPORTED) |
x{n,}+ | n or more x, possessive (NOT SUPPORTED) |
x{n}+ | exactly n x, possessive (NOT SUPPORTED) |
| Flags |
i | case-insensitive (default false) |
m | multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false) |
s | let . match \n (default false) |
U | ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false) |
Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z).
| Grouping |
(re) | numbered capturing group (submatch) |
(?P<name>re) | named & numbered capturing group (submatch) |
(?<name>re) | named & numbered capturing group (submatch) (NOT SUPPORTED) |
(?'name're) | named & numbered capturing group (submatch) (NOT SUPPORTED) |
(?:re) | non-capturing group |
(?flags) | set flags within current group; non-capturing |
(?flags:re) | set flags during re; non-capturing |
(?#text) | comment (NOT SUPPORTED) |
(?|x|y|z) | branch numbering reset (NOT SUPPORTED) |
(?>re) | possessive match of re (NOT SUPPORTED) |
re@> | possessive match of re (NOT SUPPORTED) VIM |
%(re) | non-capturing group (NOT SUPPORTED) VIM |
| Empty strings |
^ | at beginning of text or line (m=true) |
$ | at end of text (like \z not \Z) or line (m=true) |
\A | at beginning of text |
\b | at ASCII word boundary (\w on one side and \W, \A, or \z on the other) |
\B | not at ASCII word boundary |
\g | at beginning of subtext being searched (NOT SUPPORTED) PCRE |
\G | at end of last match (NOT SUPPORTED) PERL |
\Z | at end of text, or before newline at end of text (NOT SUPPORTED) |
\z | at end of text |
(?=re) | before text matching re (NOT SUPPORTED) |
(?!re) | before text not matching re (NOT SUPPORTED) |
(?<=re) | after text matching re (NOT SUPPORTED) |
(?<!re) | after text not matching re (NOT SUPPORTED) |
re& | before text matching re (NOT SUPPORTED) VIM |
re@= | before text matching re (NOT SUPPORTED) VIM |
re@! | before text not matching re (NOT SUPPORTED) VIM |
re@<= | after text matching re (NOT SUPPORTED) VIM |
re@<! | after text not matching re (NOT SUPPORTED) VIM |
\zs | sets start of match (= \K) (NOT SUPPORTED) VIM |
\ze | sets end of match (NOT SUPPORTED) VIM |
\%^ | beginning of file (NOT SUPPORTED) VIM |
\%$ | end of file (NOT SUPPORTED) VIM |
\%V | on screen (NOT SUPPORTED) VIM |
\%# | cursor position (NOT SUPPORTED) VIM |
\%'m | mark m position (NOT SUPPORTED) VIM |
\%23l | in line 23 (NOT SUPPORTED) VIM |
\%23c | in column 23 (NOT SUPPORTED) VIM |
\%23v | in virtual column 23 (NOT SUPPORTED) VIM |
| Escape sequences |
\a | bell (≡ \007) |
\f | form feed (≡ \014) |
\t | horizontal tab (≡ \011) |
\n | newline (≡ \012) |
\r | carriage return (≡ \015) |
\v | vertical tab character (≡ \013) |
\* | literal *, for any punctuation character * |
\123 | octal character code (up to three digits) |
\x7F | hex character code (exactly two digits) |
\x{10FFFF} | hex character code |
\C | match a single byte even in UTF-8 mode |
\Q...\E | literal text ... even if ... has punctuation |
\1 | backreference (NOT SUPPORTED) |
\b | backspace (NOT SUPPORTED) (use \010) |
\cK | control char ^K (NOT SUPPORTED) (use \001 etc) |
\e | escape (NOT SUPPORTED) (use \033) |
\g1 | backreference (NOT SUPPORTED) |
\g{1} | backreference (NOT SUPPORTED) |
\g{+1} | backreference (NOT SUPPORTED) |
\g{-1} | backreference (NOT SUPPORTED) |
\g{name} | named backreference (NOT SUPPORTED) |
\g<name> | subroutine call (NOT SUPPORTED) |
\g'name' | subroutine call (NOT SUPPORTED) |
\k<name> | named backreference (NOT SUPPORTED) |
\k'name' | named backreference (NOT SUPPORTED) |
\lX | lowercase X (NOT SUPPORTED) |
\ux | uppercase x (NOT SUPPORTED) |
\L...\E | lowercase text ... (NOT SUPPORTED) |
\K | reset beginning of $0 (NOT SUPPORTED) |
\N{name} | named Unicode character (NOT SUPPORTED) |
\R | line break (NOT SUPPORTED) |
\U...\E | upper case text ... (NOT SUPPORTED) |
\X | extended Unicode sequence (NOT SUPPORTED) |
\%d123 | decimal character 123 (NOT SUPPORTED) VIM |
\%xFF | hex character FF (NOT SUPPORTED) VIM |
\%o123 | octal character 123 (NOT SUPPORTED) VIM |
\%u1234 | Unicode character 0x1234 (NOT SUPPORTED) VIM |
\%U12345678 | Unicode character 0x12345678 (NOT SUPPORTED) VIM |
| Character class elements |
x | single character |
A-Z | character range (inclusive) |
\d | Perl character class |
[:foo:] | ASCII character class foo |
\p{Foo} | Unicode character class Foo |
\pF | Unicode character class F (one-letter name) |
| Named character classes as character class elements |
[\d] | digits (≡ \d) |
[^\d] | not digits (≡ \D) |
[\D] | not digits (≡ \D) |
[^\D] | not not digits (≡ \d) |
[[:name:]] | named ASCII class inside character class (≡ [:name:]) |
[^[:name:]] | named ASCII class inside negated character class (≡ [:^name:]) |
[\p{Name}] | named Unicode property inside character class (≡ \p{Name}) |
[^\p{Name}] | named Unicode property inside negated character class (≡ \P{Name}) |