使用正则表达式查找两个字符串之间的多个匹配项 [英] Using regex to find multiple matches between two strings
问题描述
想象一下,我有一个像这样的字符串:
Imagine I have a string like this:
c x c x A c x c x c B c x c x
我想找到任何"c"在"A"之间的字符和"B".因此,在此示例中,我需要获得3个匹配项.
And I want to find any "c" character that is between "A" and "B". So in this example I need to get 3 matches.
我知道我可以使用先行和先行令牌.所以我用了这个正则表达式:
I know that I can use lookahead and lookbehind tokens. So I used this regex:
(?< = A).* c.*(?= B)
但是它得到了A和B之间的所有刺痛: c x c x c
作为结果.
But it gets all the sting between A and B: c x c x c
as one result.
如果我删除.*
部分,将根本没有匹配项.
And if I remove the .*
parts, there will be no match at all.
我做了一个此处的示例.这样您就可以看到结果.
I made an example here. so you can see the results.
推荐答案
这里有两种常见方案:1) A
和 B
是不同的单个字符串,2) A
和 B
是不同多字符字符串.
There are two common scenarios here: 1) the A
and B
are different single character strings, 2) A
and B
are different mutlicharacter strings.
场景1
您可以使用否定的字符类:
You may use negated character classes:
(?:\G(?!^)|A)[^AB]*?\Kc(?=[^AB]*B)
请参见此regex演示.详细信息:
-
(?:\ G(?!^)| A)
-A
或上一次成功匹配的结尾 -
[^ AB] *?
-除A
和B
之外的任何零个或多个字符,并且尽可能少 -
\ K
-匹配重置运算符,该操作会丢弃到目前为止在整体内存匹配缓冲区中所有已匹配的文本 -
c
-一个c
字符/字符串 -
(?= [^ AB] * B)
-必须紧跟零个或多个字符,而不是A
和B
和然后B
字符立即位于当前位置的右侧.
(?:\G(?!^)|A)
-A
or end of the previous successful match[^AB]*?
- any zero or more chars other thanA
andB
, as few as possible\K
- match reset operator that discards all text matched so far in the overall memory match bufferc
- ac
char/string(?=[^AB]*B)
- that must be followed with zero or more chars other thanA
andB
and thenB
char immediately to the right of the current location.
场景2
如果 A
和 B
是多字符字符串的占位符,则说 ABC
和 BCE
和 c
是某种类似于 c \ d +
的模式(用于匹配 c
及其后的一个或多个数字)
If A
and B
are placeholders for multichar strings, say, ABC
and BCE
and the c
is some pattern like c\d+
(to match c
and one or more digits after it) use
(?s)(?:\G(?!^)|ABC)(?:(?!ABC).)*?\Kc\d+(?=.*?BCE)
请参见此regex演示.详细信息:
-
(?s)
-一个DOTALL修饰符,它使正则表达式引擎与匹配的任何字符.
-
(?:\ G(?!^)| ABC)
-ABC
或上一次成功匹配的结尾 -
(?:(?!! ABC).)*?
-0次或多次的任何字符,不会启动ABC
字符序列 -
\ K
-匹配重置运算符 -
c \ d +
-c
和一位或多位数字 -
(?=.*?BCE)
-尽可能少的零个或多个字符,后跟BCE
.
(?s)
- a DOTALL modifier that makes the regex engine match any char with.
(?:\G(?!^)|ABC)
-ABC
or end of the previous successful match(?:(?!ABC).)*?
- any char, 0 or more times, that does not start anABC
char sequence\K
- match reset operatorc\d+
-c
and one or more digits(?=.*?BCE)
- any zero or more chars, as few as possible, followed withBCE
.
这篇关于使用正则表达式查找两个字符串之间的多个匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!