语言或库之间的正则表达式性能 [英] Regex performances between languages or libraries
问题描述
关于这个主题我什么也找不到,所以我想知道是否有人比较过不同语言之间的正则表达式匹配速度.我想知道哪种语言可以更快地进行正则表达式评估,因为在我当前的项目中,我需要不断评估大量的正则表达式.语言的选择主要取决于这种性能.
I couldn't find anything about this subject, so I wonder if anyone has compared the speed of regex matching among different languages. I would like to know which language proceeds regex evaluations faster because in my current project, I need to evaluate an enormous amount of regular expressions constantly. The choice of the language will be mainly based on this performance.
我的想法是C/C ++自然会更快,但是我想尽可能避免它,而且我不确定我是否正确.例如,C#库可能将本机代码与P/Invoke一起使用,因此速度差异可能很可笑.但是我不知道选择哪个库,或者我是否需要围绕C ++库创建包装器(哪个?).
My idea is that C/C++ will be naturally faster but I want to avoid it if possible, and I'm not sure if I'm right. For example a C# library may use native code with P/Invoke and so the speed difference may be ridiculous. But I don't know what library to choose, or if I need to create a wrapper around a C++ library (which one?).
推荐答案
正则表达式有哪些种类?他们会使用诸如超前,后向,反向引用,勉强量词,原子组,所有格等之类的功能吗?
What kind of regexes? Will they use features like lookaheads, lookbehinds, backreferences, reluctant quantifiers, atomic groups, possessive quantifiers, etc., etc.?
其他响应者已链接到 regex-dna基准,但它仅使用所有正则表达式类型共有的最基本功能,例如Kleene星(*
)和轮换(|
).因此,尽管GNU C/C ++实现显然是成功的赢家,但如果您需要我上面列出的任何功能,它们将不会对您有任何帮助.
Other responders have linked to the regex-dna benchmark, but it only uses the most basic features shared by all regex flavors, like the Kleene star (*
) and alternation (|
). So, while the GNU C/C++ implementations seem to be the clear winners, they won't do you any good if you need any of the features I listed above.
另一个考虑因素是Unicode支持.如果要处理实际文本(而不是像regex-dna
基准中那样以文本表示的数据),则应使用具有良好Unicode支持的正则表达式.
Another consideration is Unicode support. If you're dealing with actual text (and not data represented as text, like in the regex-dna
benchmark), you should use a regex flavor with good Unicode support.
我建议您使用C#. .NET regex风格没有慢速的声誉(这是关于IMO regex速度的唯一明智的说法),而对于性能至关重要的应用程序,它提供了
I suggest you look into C#. The .NET regex flavor does not have a reputation for being slow (which is the only sensible thing you can say about regex speeds IMO), and for performance-critical applications it provides the option of compiling directly to byte code for a substantial performance boost.
这篇关于语言或库之间的正则表达式性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!