通过在大文本上多次调用Regex.IsMatch来优化性能 [英] Optimize performance with multiple calls to Regex.IsMatch on large text
问题描述
我有一个长文本(50-60 KB),我需要针对它运行几个正则表达式(总共约100条规则)。
I have a long text (50-60 KB) and I need to run several regular expressions against it (about 100 rules in total). However, this is so slow that it essentially doesn't work.
我所做的全部工作是围绕规则创建一个循环,其中每个规则都执行一个 Regex.IsMatch()
。
All I have done is created a loop around the rules where each rule does a Regex.IsMatch()
.
是否可以优化此方法?
更新
每个规则正在执行的示例代码:
Sample code of what each rule is doing:
public class SomeRegexInterceptor : ValidatorBase
{
private readonly Regex _rgx = new Regex("some regex", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline);
public override void Intercept(string html, ValidationResultCollection collection)
{
if (!_rgx.IsMatch(html)) return;
/* do something irrelevant here */
}
}
推荐答案
使用正则表达式替换最重要的事情是声明正则表达式的方式和位置。 永远不要在循环内初始化正则表达式对象。
The most important thing about the usage of Regex replacements is how and where you declare your Regex. Never initialize a Regex object inside a loop.
创建静态类并添加公共静态只读
带有 RegexOptions.Compiled
标志的正则表达式字段。
Create a static class and add public static readonly
Regex fields with RegexOptions.Compiled
flag set.
然后,在需要的地方使用它们,例如 MyRegexClass.LeadingWhitespace.Replace(str,string.Empty)
。
Then, use them wherever you need using something like MyRegexClass.LeadingWhitespace.Replace(str, string.Empty)
.
请注意,如果需要使用 Regex.Replace
,则无需检查以前是否与 Regex.IsMatch
匹配。
Note that if you need to use Regex.Replace
, you do not need to check if there is a match with Regex.IsMatch
before.
阅读并遵循 .NET Framework中正则表达式的最佳做法 ,即:
Read and follow the recommendations outlined at Best Practices for Regular Expressions in the .NET Framework, namely:
- Consider the Input Source
- Handle Object Instantiation Appropriately
- Take Charge of Backtracking
- Use Time-out Values
- Capture Only When Necessary
另外,请考虑逐行处理文件,并尽可能避免使用正则表达式。
Also, consider processing the file line by line, and avoid regular expressions wherever you can do without them.
这篇关于通过在大文本上多次调用Regex.IsMatch来优化性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!