语言或库之间的正则表达式性能 [英] Regex performances between languages or libraries

查看:64
本文介绍了语言或库之间的正则表达式性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于这个主题我什么也找不到,所以我想知道是否有人比较过不同语言之间的正则表达式匹配速度.我想知道哪种语言可以更快地进行正则表达式评估,因为在我当前的项目中,我需要不断评估大量的正则表达式.语言的选择主要取决于这种性能.

I couldn't find anything about this subject, so I wonder if anyone has compared the speed of regex matching among different languages. I would like to know which language proceeds regex evaluations faster because in my current project, I need to evaluate an enormous amount of regular expressions constantly. The choice of the language will be mainly based on this performance.

我的想法是C/C ++自然会更快,但是我想尽可能避免它,而且我不确定我是否正确.例如,C#库可能将本机代码与P/Invoke一起使用,因此速度差异可能很可笑.但是我不知道选择哪个库,或者我是否需要围绕C ++库创建包装器(哪个?).

My idea is that C/C++ will be naturally faster but I want to avoid it if possible, and I'm not sure if I'm right. For example a C# library may use native code with P/Invoke and so the speed difference may be ridiculous. But I don't know what library to choose, or if I need to create a wrapper around a C++ library (which one?).

推荐答案

正则表达式有哪些种类?他们会使用诸如超前,后向,反向引用,勉强量词,原子组,所有格等之类的功能吗?

What kind of regexes? Will they use features like lookaheads, lookbehinds, backreferences, reluctant quantifiers, atomic groups, possessive quantifiers, etc., etc.?

其他响应者已链接到 regex-dna基准,但它仅使用所有正则表达式类型共有的最基本功能,例如Kleene星(*)和轮换(|).因此,尽管GNU C/C ++实现显然是成功的赢家,但如果您需要我上面列出的任何功能,它们将不会对您有任何帮助.

Other responders have linked to the regex-dna benchmark, but it only uses the most basic features shared by all regex flavors, like the Kleene star (*) and alternation (|). So, while the GNU C/C++ implementations seem to be the clear winners, they won't do you any good if you need any of the features I listed above.

另一个考虑因素是Unicode支持.如果要处理实际文本(而不是像regex-dna基准中那样以文本表示的数据),则应使用具有良好Unicode支持的正则表达式.

Another consideration is Unicode support. If you're dealing with actual text (and not data represented as text, like in the regex-dna benchmark), you should use a regex flavor with good Unicode support.

我建议您使用C#. .NET regex风格没有慢速的声誉(这是关于IMO regex速度的唯一明智的说法),而对于性能至关重要的应用程序,它提供了

I suggest you look into C#. The .NET regex flavor does not have a reputation for being slow (which is the only sensible thing you can say about regex speeds IMO), and for performance-critical applications it provides the option of compiling directly to byte code for a substantial performance boost.

这篇关于语言或库之间的正则表达式性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆