正则表达式性能优化技巧和窍门 [英] Regex Performance Optimization Tips and Tricks
问题描述
在阅读了相当不错的文章之后关于Java中的正则表达式优化,我想知道创建快速有效的正则表达式的其他好技巧是什么?
After reading a pretty good article on regex optimization in java I was wondering what are the other good tips for creating fast and efficient regular expressions?
推荐答案
- 当您需要重复分组但又不需要使用来自传统
(capturing)
组的捕获值时,请使用非捕获组(?:pattern)
. - 在适用的
(?>pattern)
上,使用原子组(或非回溯子表达式). - 通过设计正则表达式来尽早终止不匹配项,避免像瘟疫一样避免灾难性回溯. li>
- Use the non-capturing group
(?:pattern)
when you need to repeat a grouping but don't need to use the captured value that comes from a traditional(capturing)
group. - Use the atomic group (or non-backtracking subexpression) when applicable
(?>pattern)
. - Avoid catastrophic backtracking like the plague by designing your regular expressions to terminate early for non-matches.
我制作了一个演示这些技术的视频.我从灾难性回溯文章(x+x+)+y
中的非常写得不好的正则表达式开始.然后,经过一系列优化后,我将其速度提高了300万倍,并在每次更改后进行了基准测试.该视频特定于.NET,但其中许多内容也适用于大多数其他正则表达式类型:
I created a video demonstrating these techniques. I started with the very poorly written regular expression in the catastrophic backtracking article (x+x+)+y
. And then I made it 3 million times faster after a series of optimizations, benchmarking after every change. The video is specific to .NET but many of these things apply to most other regex flavors as well:
这篇关于正则表达式性能优化技巧和窍门的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!