在输入中的任意位置查找两个字符串的正则表达式 [英] Regular expression to find two strings anywhere in input

查看:35
本文介绍了在输入中的任意位置查找两个字符串的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何编写正则表达式来匹配字符串中任意位置的两个给定字符串?

How do I write a regular expression to match two given strings, at any position in the string?

例如,如果我要搜索 catmat,它应该匹配:

For example, if I am searching for cat and mat, it should match:

The cat slept on the mat in front of the fire.
At 5:00 pm, I found the cat scratching the wool off the mat.

无论这些字符串前面是什么.

No matter what precedes these strings.

推荐答案

/^.*?\bcat\b.*?\bmat\b.*?$/m

使用 m 修饰符(确保开始/结束元字符匹配换行符而不是字符串的开头和结尾):

Using the m modifier (which ensures the beginning/end metacharacters match on line breaks rather than at the very beginning and end of the string):

  • ^ 匹配行首
  • .*? 匹配前一行的任何内容...
  • \b 匹配词边界第一次出现的词边界(正如@codacci 讨论的那样)
  • 然后是字符串 cat 和另一个单词边界;请注意,下划线被视为单词"字符,因此 _cat_匹配*;
  • .*?:之前的任何字符...
  • 边界,mat,边界
  • .*?:...之前的任何剩余字符
  • $:行尾.
  • ^ matches the line beginning
  • .*? matches anything on the line before...
  • \b matches a word boundary the first occurrence of a word boundary (as @codaddict discussed)
  • then the string cat and another word boundary; note that underscores are treated as "word" characters, so _cat_ would not match*;
  • .*?: any characters before...
  • boundary, mat, boundary
  • .*?: any remaining characters before...
  • $: the end of the line.

使用 \b 来确保指定的单词不是较长单词的一部分很重要,并且使用非贪婪的通配符 (.*?) 很重要与 greedy (.*) 相比,因为后者会在字符串上失败,例如垫子上有一只猫在猫下面."(它将匹配最后一次出现的cat"而不是第一次.)

It's important to use \b to ensure the specified words aren't part of longer words, and it's important to use non-greedy wildcards (.*?) versus greedy (.*) because the latter would fail on strings like "There is a cat on top of the mat which is under the cat." (It would match the last occurrence of "cat" rather than the first.)

* 如果你想能够匹配_cat_,你可以使用:

* If you want to be able to match _cat_, you can use:

/^.*?(?:\b|_)cat(?:\b|_).*?(?:\b|_)mat(?:\b|_).*?$/m

匹配指定单词周围的下划线 单词边界.(?:) 表示非捕获组,这有助于提高性能或避免捕获冲突.

which matches either underscores or word boundaries around the specified words. (?:) indicates a non-capturing group, which can help with performance or avoid conflicted captures.

评论中提出了一个问题,即该解决方案是否适用于短语而不仅仅是单词.答案是,绝对可以.以下将匹配包含第一个短语和第二个短语的行":

A question was raised in the comments about whether the solution would work for phrases rather than just words. The answer is, absolutely yes. The following would match "A line which includes both the first phrase and the second phrase":

/^.*?(?:\b|_)first phrase here(?:\b|_).*?(?:\b|_)second phrase here(?:\b|_).*?$/m

编辑 2:如果顺序无关紧要,您可以使用:

Edit 2: If order doesn't matter you can use:

/^.*?(?:\b|_)(first(?:\b|_).*?(?:\b|_)second|second(?:\b|_).*?(?:\b|_)first)(?:\b|_).*?$/m

如果性能真的是这里的一个问题,那么环视(如果您的正则表达式引擎支持它)可能(但可能不会)比上面的表现更好,但我会留下可以说更复杂的环视版本和性能测试作为提问者/读者的练习.

And if performance is really an issue here, it's possible lookaround (if your regex engine supports it) might (but probably won't) perform better than the above, but I'll leave both the arguably more complex lookaround version and performance testing as an exercise to the questioner/reader.

根据@Alan Moore 的评论进行编辑.我没有机会测试它,但我会相信你的话.

Edited per @Alan Moore's comment. I didn't have a chance to test it, but I'll take your word for it.

这篇关于在输入中的任意位置查找两个字符串的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆