使用静态 Regex.IsMatch 与创建 Regex 实例 [英] using static Regex.IsMatch vs creating an instance of Regex

查看:25
本文介绍了使用静态 Regex.IsMatch 与创建 Regex 实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 C# 中,你是否应该有这样的代码:

In C# should you have code like:

public static string importantRegex = "magic!";

public void F1(){
  //code
  if(Regex.IsMatch(importantRegex)){
    //codez in here.
  }
  //more code
}
public void main(){
  F1();
/*
  some stuff happens......
*/
  F1();
}

还是应该保留一个包含重要模式的正则表达式实例?使用 Regex.IsMatch 的成本是多少?我想在每个 Regex 实例中都创建了一个 NFA.据我所知,这个 NFA 的创作是非常重要的.

or should you persist an instance of a Regex containing the important pattern? What is the cost of using Regex.IsMatch? I imagine there is an NFA created in each Regex intance. From what I understand this NFA creation is non trivial.

推荐答案

与我典型的自负不同,我在这个答案上有点逆转.

In a rare departure from my typical egotism, I'm kind of reversing myself on this answer.

我的原始答案(保留在下面)基于对 .NET 框架 1.1 版本的检查.这是非常可耻的,因为在我回答时 .NET 2.0 已经发布三年多了,并且它包含对 Regex 类的更改,这些更改显着影响了静态方法和实例方法之间的差异.

My original answer, preserved below, was based on an examination of version 1.1 of the .NET framework. This is pretty shameful, since .NET 2.0 had been out for over three years at the time of my answer, and it contained changes to the Regex class that significantly affect the difference between the static and instance methods.

在 .NET 2.0(和 4.0)中,静态 IsMatch 函数定义如下:

In .NET 2.0 (and 4.0), the static IsMatch function is defined as follows:

public static bool IsMatch(string input, string pattern){
    return new Regex(pattern, RegexOptions.None, true).IsMatch(input);
}

这里的显着区别是作为第三个参数的 true 很小.这对应于名为useCache"的参数.如果为真,则在第二次和后续使用时从缓存中检索已解析的树.

The significant difference here is that little true as the third argument. That corresponds to a parameter named "useCache". When that is true, then the parsed tree is retrieved from cached on the second and subsequent use.

这种缓存消耗了静态方法和实例方法之间的大部分(但不是全部)性能差异.在我的测试中,静态 IsMatch 方法仍然比实例方法慢大约 20%,但是当在一组 10,000 个输入字符串上运行 100 次时,这仅增加了大约半秒(对于一个总共 100 万次操作).

This caching eats up most—but not all—of the performance difference between the static and instance methods. In my tests, the static IsMatch method was still about 20% slower than the instance method, but that only amounted to about a half second increase when run 100 times over a set of 10,000 input strings (for a total of 1 million operations).

在某些情况下,这种 20% 的放缓仍然很重要.如果您发现自己正在对数亿个字符串进行正则表达式,您可能会想尽一切可能提高效率.但我敢打赌,在 99% 的情况下,您使用特定 Regex 的次数不会超过几次,而静态方法所损失的额外毫秒甚至不会接近明显.

This 20% slowdown can still be significant in some scenarios. If you find yourself regexing hundreds of millions of strings, you'll probably want to take every step you can to make it more efficient. But I'd bet that 99% of the time, you're using a particular Regex no more than a handful of times, and the extra millisecond you lose to the static method won't be even close to noticeable.

devgeezer 的支持,他在大约一年前就指出了这一点,尽管似乎没有人注意到.

Props to devgeezer, who pointed this out almost a year ago, although no one seemed to notice.

我的旧答案如下:

静态IsMatch函数定义如下:

public static bool IsMatch(string input, string pattern){
    return new Regex(pattern).IsMatch(input);
}

而且,是的,Regex 对象的初始化并非微不足道.您应该使用静态 IsMatch(或任何其他静态 Regex 函数)作为仅用于您将只使用一次的模式的快捷方式.如果您要重用该模式,那么重用 Regex 对象也是值得的.

And, yes, initialization of a Regex object is not trivial. You should use the static IsMatch (or any of the other static Regex functions) as a quick shortcut only for patterns that you will use only once. If you will reuse the pattern, it's worth it to reuse a Regex object, too.

至于是否应该指定 RegexOptions.Compiled,正如 Jon Skeet 所建议的,那是另一回事.答案是:这取决于.对于简单的模式或仅使用几次的模式,使用非编译实例可能会更快.在决定之前,您绝对应该进行概要分析.编译一个正则表达式对象的成本确实很大,可能不值得.

As to whether or not you should specify RegexOptions.Compiled, as suggested by Jon Skeet, that's another story. The answer there is: it depends. For simple patterns or for patterns used only a handful of times, it may well be faster to use a non-compiled instance. You should definitely profile before deciding. The cost of compiling a regular expression object is quite large indeed, and may not be worth it.

以下面的例子为例:

const int count = 10000;

string pattern = "^[a-z]+[0-9]+$";
string input   = "abc123";

Stopwatch sw = Stopwatch.StartNew();
for(int i = 0; i < count; i++)
    Regex.IsMatch(input, pattern);
Console.WriteLine("static took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
Regex rx = new Regex(pattern);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("instance took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
rx = new Regex(pattern, RegexOptions.Compiled);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("compiled took {0} seconds.", sw.Elapsed.TotalSeconds);

count = 10000 处,如所列,第二个输出最快.将count增加到100000,编译版本胜出.

At count = 10000, as listed, the second output is fastest. Increase count to 100000, and the compiled version wins.

这篇关于使用静态 Regex.IsMatch 与创建 Regex 实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆