将两个正则表达式A和B组合成C =(A而不是B) [英] Combining two regular expressions A and B into C = (A and not B)

查看:203
本文介绍了将两个正则表达式A和B组合成C =(A而不是B)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个正则表达式 A 和另一个正则表达式 B 作为输入。我想创建一个新的正则表达式 C ,当且仅当




  • A 匹配行,并且

  • B 不匹配行。



对于 A B 的非常简单的情况,我能够手动创建 C :假设 A x B y ,然后是 C = ^ [^ y] * x [^ y] * $ 是有效的解决方案。



显然,随着 A B 变得更加复杂,问题变得更加棘手。
$ b $是否有通用算法可以从 A B 中创建这样的正则表达式 C ? b




注意:由于常规语言已关闭在交集和补码下,这种算法在理论上应该存在。我知道现代IT系统中可用的正则表达式的表达能力超过了正式的正则语言,但是 A B 仅限于功能子集的解决方案可以使用正式语言,但是 C 使用了现代正则表达式引擎的扩展功能,对我来说非常合适。

解决方案

编辑



基于OP的初始正则表达式,并由 @ruakh 在我的答案下方的评论中,OP选择使用 ^(?!。* B)。* A 。此解决方案匹配包含 B 的所有字符串,而不是我的原始帖子(以下)所针对的字符串,即匹配 B 最初是假设,后来由OP澄清(在我的回答下方的注释中)。






原始帖子



如果我正确地理解了您的问题,那么您正在寻找的是与一个给定模式匹配的字符串 A ,但不匹配模式 B ,这样您的新模式 C A B 组成。



简单的正则表达式



鉴于模式 A x 并且模式 B y ,新的正则表达式模式 C 应该如下所示:

  ^(?! B $)A $ 

或使用您提供的示例正则表达式:

  ^(?! y $)x $ 






也许用下面的例子更好地证明了这一点:




  • A 模式: x。

  • B 模式: xx

  • C 变为: ^(? !xx $)x。$



这将匹配 xa 但不是 xx ,如此处






复杂的正则表达式



复杂的正则表达式,它可能完全取决于模式和所使用的正则表达式引擎。正则表达式可能会超时,如果使用递归,控制动词,模式修饰符等,则可能会完全中断。



一个更好的选择是同时评估两个正则表达式



示例1



下面是一个示例,其中给定正则表达式两种模式都使用相同的预定义模式名称:




  • A (?(DEFINE)(?< t> x))(?& t)。

  • B (?(DEFINE)(?< t> x))(?& t){2}

  • C ^(?!(?(DEFINE)(?< t> x))(?& t){ 2} $)(?(DEFINE)(?< t> x))(?& t)。$



它失败,如下所示:此处



示例2



这是一个递归示例,无法正常工作:




  • A a (?R)z

  • B az

  • ^(?! az $)a(?R)?z $



它失败,如图所示此处






当然,这是假定 C 的初始假设为: ^(?! B $)A $ 是用于匹配 A 和不匹配的正确模式的 B


Let's say I have one regular expression A and another regular expression B as input. I want to create a new regular expression C which matches a line if and only if

  • A matches the line and
  • B does not match the line.

I am able to manually create C for very simple cases of A and B: Let's say A is x and B is y, then C = ^[^y]*x[^y]*$ would be a valid solution.

Obviously, the problem gets harder as A and B get more complex. Is there a generic algorithm for creating such a regular expression C out of A and B?


Note: Since regular languages are closed under intersection and complement, such an algorithm should theoretically exist. I am aware that the expressive power of regular expressions available in modern IT systems exceeds that of formal regular languages, but a solution where A and B are restricted to the subset of features available in formal languages, but C uses extended features of modern-day regex engines, is perfectly fine for me.

解决方案

Edit

Based on the OP's initial regex and as pointed out by @ruakh in the comments below my answer, the OP has chosen to use ^(?!.*B).*A. This solution matches any strings that contain B, rather than what my original post (below) targeted, which is any string that matches B as was originally assumed and later clarified (in the comments below my answer) by the OP.


Original Post

If I understand your question correctly, you're looking to match a string that matches one given pattern A, but not match pattern B, such that your new pattern C is comprised of both A and B.

Simple regex

Given that the pattern A is x and the pattern B is y, the new regex pattern C should be as follows:

^(?!B$)A$

or with the sample regex you presented:

^(?!y$)x$


Maybe a better example to demonstrate this is with the following:

  • A pattern: x.
  • B pattern: xx
  • C becomes: ^(?!xx$)x.$

This would match xa but not xx as seen here


Complex regex

With regards to more complex regular expressions, it might depend on the patterns entirely and the regex engine that is used. The regular expression could time out and if recursion, control verbs, pattern modifiers, etc. are used, it could break entirely.

A better option would be to evaluate both regular expressions independently with code to determine the outcome.

Example 1

Here's an example where the regular expression breaks given that both patterns use the same predefined pattern name:

  • A: (?(DEFINE)(?<t>x))(?&t).
  • B: (?(DEFINE)(?<t>x))(?&t){2}
  • C: ^(?!(?(DEFINE)(?<t>x))(?&t){2}$)(?(DEFINE)(?<t>x))(?&t).$

It fails as shown here

Example 2

Here's a recursion example that fails to work properly:

  • A: a(?R)z
  • B: az
  • ^(?!az$)a(?R)?z$

It fails as shown here


Of course, this assumes that the initial assumption that C: ^(?!B$)A$ is the correct pattern to use for the matching of A and non-matching of B.

这篇关于将两个正则表达式A和B组合成C =(A而不是B)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆