一个字符串上有数百个RegEx [英] Hundreds of RegEx on one string

查看:130
本文介绍了一个字符串上有数百个RegEx的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Java / Groovy通过RegEx在字符串中查找匹配(并提取它们)。在字符串中找到200个或更多正则表达式匹配的最佳方式是什么,可以说性能方面是5000个字符。简而言之,是否可以避免为每个RegEx扫描字符串?



我可以使用由java提供的Pattern和Matcher类,但是我将不得不编译200模式,然后将该字符串传递给匹配器200次。如果你的正则表达式没有共同的匹配,你总是可以把它们组合成一个巨大的一个通过使用替代品,例如

 (regex1)| (regex2)| .... | (regexN)

然而,考虑到您的问题的复杂性,我认为您应该考虑从正则表达式切换到适当的扫描器/解析器组合。这需要时间,但最终的解决方案将更易于管理。你为什么不检查 Antlr


I am using Java/Groovy to find matches(and extract them) on a string through RegEx. What is the best way of finding matches of 200 or more regex on a string of, lets say, 5000 characters in terms of performance. In a nutshell, is it possible to avoid scanning the string for each RegEx?

I can use the Pattern and Matcher classes provided by java but then I will have to compile 200 patterns and then pass the string to matcher 200 times. Is that the only way of doing it?

解决方案

If your regexes do not have common matches you can always combine them in a gigantic one by using alternatives, e.g.

( regex1 ) | ( regex2 ) | .... | ( regexN )

However given the complexity of your problem I think you should consider switching from regexes to a proper scanner/parser combination. It will take time upfront, but the resulting solution will be much more manageable. Why don't you check out Antlr?

这篇关于一个字符串上有数百个RegEx的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆