使用正则表达式查找两个字符串之间的多个匹配项 [英] Using regex to find multiple matches between two strings

查看:108
本文介绍了使用正则表达式查找两个字符串之间的多个匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一下,我有一个像这样的字符串:

Imagine I have a string like this:

c x c x A c x c x c B c x c x

我想找到任何"c"在"A"之间的字符和"B".因此,在此示例中,我需要获得3个匹配项.

And I want to find any "c" character that is between "A" and "B". So in this example I need to get 3 matches.

我知道我可以使用先行和先行令牌.所以我用了这个正则表达式:

I know that I can use lookahead and lookbehind tokens. So I used this regex:

(?< = A).* c.*(?= B)

但是它得到了A和B之间的所有刺痛: c x c x c 作为结果.

But it gets all the sting between A and B: c x c x c as one result.

如果我删除.* 部分,将根本没有匹配项.

And if I remove the .* parts, there will be no match at all.

我做了一个此处的示例.这样您就可以看到结果.

I made an example here. so you can see the results.

推荐答案

这里有两种常见方案:1) A B 不同的单个字符串,2) A B 不同多字符字符串.

There are two common scenarios here: 1) the A and B are different single character strings, 2) A and B are different mutlicharacter strings.

场景1

您可以使用否定的字符类:

You may use negated character classes:

(?:\G(?!^)|A)[^AB]*?\Kc(?=[^AB]*B)

请参见此regex演示.详细信息:

  • (?:\ G(?!^)| A)- A 或上一次成功匹配的结尾
  • [^ AB] *?-除 A B 之外的任何零个或多个字符,并且尽可能少
  • \ K -匹配重置运算符,该操作会丢弃到目前为止在整体内存匹配缓冲区中所有已匹配的文本
  • c -一个 c 字符/字符串
  • (?= [^ AB] * B)-必须紧跟零个或多个字符,而不是 A B 和然后 B 字符立即位于当前位置的右侧.
  • (?:\G(?!^)|A) - A or end of the previous successful match
  • [^AB]*? - any zero or more chars other than A and B, as few as possible
  • \K - match reset operator that discards all text matched so far in the overall memory match buffer
  • c - a c char/string
  • (?=[^AB]*B) - that must be followed with zero or more chars other than A and B and then B char immediately to the right of the current location.

场景2

如果 A B 是多字符字符串的占位符,则说 ABC BCE c 是某种类似于 c \ d + 的模式(用于匹配 c 及其后的一个或多个数字)

If A and B are placeholders for multichar strings, say, ABC and BCE and the c is some pattern like c\d+ (to match c and one or more digits after it) use

(?s)(?:\G(?!^)|ABC)(?:(?!ABC).)*?\Kc\d+(?=.*?BCE)

请参见此regex演示.详细信息:

  • (?s)-一个DOTALL修饰符,它使正则表达式引擎与匹配的任何字符.
  • (?:\ G(?!^)| ABC)- ABC 或上一次成功匹配的结尾
  • (?:(?!! ABC).)*?-0次或多次的任何字符,不会启动 ABC 字符序列
  • \ K -匹配重置运算符
  • c \ d + - c 和一位或多位数字
  • (?=.*?BCE)-尽可能少的零个或多个字符,后跟 BCE .
  • (?s) - a DOTALL modifier that makes the regex engine match any char with .
  • (?:\G(?!^)|ABC) - ABC or end of the previous successful match
  • (?:(?!ABC).)*? - any char, 0 or more times, that does not start an ABC char sequence
  • \K - match reset operator
  • c\d+ - c and one or more digits
  • (?=.*?BCE) - any zero or more chars, as few as possible, followed with BCE.

这篇关于使用正则表达式查找两个字符串之间的多个匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆