Javascript正则表达式模式匹配单个字符串的多个字符串(AND,OR) [英] Javascript regex pattern match multiple strings ( AND, OR ) against single string

查看:290
本文介绍了Javascript正则表达式模式匹配单个字符串的多个字符串(AND,OR)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要根据相当复杂的查询来过滤字符串集合 - 在它的原始形式中它看起来像这样:

I need to filter a collection of strings based on a rather complex query - in it's "raw" form it looks like this:

nano* AND (regulat* OR *toxic* OR ((risk OR hazard) AND (exposure OR release)) )

要匹配的其中一个字符串的示例:

An example of one of the strings to match against:

Workshop on the Second Regulatory Review on Nanomaterials, 30 January 2013, Brussels

因此,我需要使用AND OR和通配符匹配 - 所以,我认为我需要在JavaScript中使用正则表达式。

So, I need to match using AND OR and wildcard characters - so, I presume I'll need to use a regex in JavaScript.

我已经正确循环,过滤并且通常正常工作,但我100%肯定我的正则表达式是错误 - 并且错误地省略了一些结果 - 这里是:

I have it all looping correctly, filtering and generally working, but I'm 100% sure my regex is wrong - and some results are being omitted wrongly - here it is:

/(nano[a-zA-Z])?(regulat[a-zA-Z]|[a-zA-Z]toxic[a-zA-Z]|((risk|hazard)*(exposure|release)))/i

任何帮助都会非常感激 - 我真的无法正确地将我的思想抽象到坚持这种语法!

Any help would be greatly appreciated - I really can't abstract my mind correctly to understand this syntax!

更新:

很少有人指出正则表达式的重要性构造,但我无法控制将被搜索的文本字符串,所以我需要找到一个无论顺序如何都可以工作的解决方案。

Few people are point out the importance of the order in which the regex is constructed, however I have no control over the text strings that will be searched, so I need to find a solution that can work regardless of the order or either.

更新:

最终使用PHP解决方案,由于Twitter API 1.0的弃用,请参阅pastebin例如函数(我知道最好在这里粘贴代码,但有很多......):

Eventually used a PHP solution, due to deprecation of twitter API 1.0, see pastebin for example function ( I know it's better to paste code here, but there's a lot... ):

函数: http://pastebin.com/MpWSGtHK
用法: http: //pastebin.com/pP2AHEvk

感谢所有帮助

推荐答案

单个正则表达式不是正确的工具,IMO:

A single regex is not the right tool for this, IMO:

/^(?=.*\bnano)(?=(?:.*\bregulat|.*toxic|(?=.*(?:\brisk\b|\bhazard\b))(?=.*(?:\bexposure\b|\brelease\b))))/i.test(subject))

返回True如果字符串符合您提出的标准,但我发现嵌套的前瞻是非常难以理解的。如果JavaScript支持注释的正则表达式,它将如下所示:

would return True if the string fulfills the criteria you set forth, but I find nested lookaheads quite incomprehensible. If JavaScript supported commented regexes, it would look like this:

^                 # Anchor search to start of string
(?=.*\bnano)      # Assert that the string contains a word that starts with nano
(?=               # AND assert that the string contains...
 (?:              #  either
  .*\bregulat     #   a word starting with regulat
 |                #  OR
  .*toxic         #   any word containing toxic
 |                #  OR
  (?=             #   assert that the string contains
   .*             #    any string
   (?:            #    followed by
    \brisk\b      #    the word risk
   |              #    OR
    \bhazard\b    #    the word hazard
   )              #    (end of inner OR alternation)
  )               #   (end of first AND condition)
  (?=             #   AND assert that the string contains
   .*             #    any string
   (?:            #    followed by
    \bexposure\b  #    the word exposure
   |              #    OR
    \brelease\b   #    the word release
   )              #    (end of inner OR alternation)
  )               #   (end of second AND condition)
 )                #  (end of outer OR alternation)
)                 # (end of lookahead assertion)

注意整个正则表达式由前瞻断言组成,因此匹配结果本身将始终为空字符串。

Note that the entire regex is composed of lookahead assertions, so the match result itself will always be the empty string.

相反,您可以使用单个正则表达式:

Instead, you could use single regexes:

if (/\bnano/i.test(str) &&
    ( 
        /\bregulat|toxic/i.test(str) ||
        ( 
            /\b(?:risk|hazard)\b/i.test(str) &&
            /\b(?:exposure|release)\b/i.test(str)
        )
    )
)    /* all tests pass */

这篇关于Javascript正则表达式模式匹配单个字符串的多个字符串(AND,OR)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆