通过在大文本上多次调用Regex.IsMatch来优化性能 [英] Optimize performance with multiple calls to Regex.IsMatch on large text

查看:192
本文介绍了通过在大文本上多次调用Regex.IsMatch来优化性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个长文本(50-60 KB),我需要针对它运行几个正则表达式(总共约100条规则)。

I have a long text (50-60 KB) and I need to run several regular expressions against it (about 100 rules in total). However, this is so slow that it essentially doesn't work.

我所做的全部工作是围绕规则创建一个循环,其中每个规则都执行一个 Regex.IsMatch()

All I have done is created a loop around the rules where each rule does a Regex.IsMatch().

是否可以优化此方法?

更新

每个规则正在执行的示例代码:

Sample code of what each rule is doing:

public class SomeRegexInterceptor : ValidatorBase
    {
        private readonly Regex _rgx = new Regex("some regex", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline); 

        public override void Intercept(string html, ValidationResultCollection collection)
        {
            if (!_rgx.IsMatch(html)) return;

            /* do something irrelevant here */
        }
    }


推荐答案

使用正则表达式替换最重要的事情是声明正则表达式的方式和位置。 永远不要在循环内初始化正则表达式对象。

The most important thing about the usage of Regex replacements is how and where you declare your Regex. Never initialize a Regex object inside a loop.

创建静态类并添加公共静态只读带有 RegexOptions.Compiled 标志的正则表达式字段。

Create a static class and add public static readonly Regex fields with RegexOptions.Compiled flag set.

然后,在需要的地方使用它们,例如 MyRegexClass.LeadingWhitespace.Replace(str,string.Empty)

Then, use them wherever you need using something like MyRegexClass.LeadingWhitespace.Replace(str, string.Empty).

请注意,如果需要使用 Regex.Replace ,则无需检查以前是否与 Regex.IsMatch 匹配。

Note that if you need to use Regex.Replace, you do not need to check if there is a match with Regex.IsMatch before.

阅读并遵循 .NET Framework中正则表达式的最佳做法 ,即:

Read and follow the recommendations outlined at Best Practices for Regular Expressions in the .NET Framework, namely:

  • Consider the Input Source
  • Handle Object Instantiation Appropriately
  • Take Charge of Backtracking
  • Use Time-out Values
  • Capture Only When Necessary

另外,请考虑逐行处理文件,并尽可能避免使用正则表达式。

Also, consider processing the file line by line, and avoid regular expressions wherever you can do without them.

这篇关于通过在大文本上多次调用Regex.IsMatch来优化性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆