如何超时 Regex 操作以防止挂在 .NET 4.5 中? [英] How do I timeout Regex operations to prevent hanging in .NET 4.5?

查看:27
本文介绍了如何超时 Regex 操作以防止挂在 .NET 4.5 中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时能够限制正则表达式操作的模式匹配持续时间可能很有用.特别是,当使用用户提供的模式来匹配数据时,由于嵌套量词和过多的回溯,该模式可能表现出较差的性能(参见 灾难性的回溯).应用超时的一种方法是异步运行正则表达式,但这可能很乏味并且会使代码混乱.

根据 .NET 中的新功能Framework 4.5 开发者预览版似乎有一种新的内置方法来支持这一点:

<块引用>

能够限制正则表达式引擎尝试的时间在超时之前解析正则表达式.

如何使用此功能?另外,使用时需要注意什么?

注意:我是 回答这个问题,因为鼓励.

解决方案

我最近研究了这个话题,因为它对我很感兴趣,我将在这里介绍主要观点.相关 MSDN 文档可在此处获得您可以查看 Regex 类以查看新的重载构造函数和静态方法.代码示例可以使用 Visual Studio 11 Developer Preview.

Regex 类接受 TimeSpan 指定超时时间.您可以在您的应用程序中在宏观和微观层面指定超时,它们可以一起使用:

设置 AppDomain 属性后,所有 Regex 操作都将使用该值作为默认超时.要覆盖应用程序范围的默认值,您只需将 matchTimeout 值传递给正则表达式构造函数或静态方法.如果未设置 AppDomain 默认值,并且未指定 matchTimeout,则模式匹配将不会超时(即,.NET 4.5 之前的原始行为).

有两个主要的异常需要处理:

  • RegexMatchTimeoutException:超时时抛出.
  • ArgumentOutOfRangeException:当matchTimeout 为负数或大于大约 24 天"时抛出.此外,TimeSpan 值为零将导致抛出此问题.

尽管不允许使用负值,但有一个例外:接受 -1 毫秒的值.Regex 类在内部接受 -1 毫秒,这是 Regex.InfiniteMatchTimeout 字段,表示匹配不应超时(即,原始预.NET 4.5 行为).

使用 matchTimeout 参数

在下面的示例中,我将演示有效和无效的超时情况以及如何处理它们:

string input = "敏捷的棕色狐狸跳过懒狗.";字符串模式 = @"([a-z ]+)*!";var timeouts = new[]{TimeSpan.FromSeconds(4),//有效TimeSpan.FromSeconds(-10)//无效};foreach (var matchTimeout in timeouts){Console.WriteLine("输入:" + matchTimeout);尝试{bool 结果 = Regex.IsMatch(input, pattern,RegexOptions.None, matchTimeout);}catch (RegexMatchTimeoutException ex){Console.WriteLine("匹配超时!");Console.WriteLine("- 指定的超时间隔:" + ex.MatchTimeout);Console.WriteLine("- 模式:" + ex.Pattern);Console.WriteLine("- 输入:" + ex.Input);}catch (ArgumentOutOfRangeException ex){Console.WriteLine(ex.Message);}Console.WriteLine();}

当使用 Regex 类的实例时,您可以访问 MatchTimeout 属性:

string input = "英文字母有26个字母";字符串模式 = @"d+";var matchTimeout = TimeSpan.FromMilliseconds(10);var sw = Stopwatch.StartNew();尝试{var re = new Regex(pattern, RegexOptions.None, matchTimeout);bool 结果 = re.IsMatch(input);sw.停止();Console.WriteLine("已完成匹配:" + sw.Elapsed);Console.WriteLine("MatchTimeout 指定:" + re.MatchTimeout);Console.WriteLine("匹配 {0} 以备不时之需!",re.MatchTimeout.Subtract(sw.Elapsed));}catch (RegexMatchTimeoutException ex){sw.停止();Console.WriteLine(ex.Message);}

使用 AppDomain 属性

"REGEX_DEFAULT_MATCH_TIMEOUT" 属性用于设置应用程序范围的默认值:

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",TimeSpan.FromSeconds(2));

如果此属性设置为无效的 TimeSpan 值或无效的对象,TypeInitializationException 将在尝试使用正则表达式时抛出.

具有有效属性值的示例:

//AppDomain 默认设置在您的应用程序中的某处AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",TimeSpan.FromSeconds(2));//正则表达式在别处使用...string input = "敏捷的棕色狐狸跳过懒狗.";字符串模式 = @"([a-z ]+)*!";var sw = Stopwatch.StartNew();尝试{//未指定超时时间,默认为 AppDomain 设置bool 结果 = Regex.IsMatch(input, pattern);sw.停止();}catch (RegexMatchTimeoutException ex){sw.停止();Console.WriteLine("匹配超时!");Console.WriteLine("应用默认值:" + ex.MatchTimeout);}catch (ArgumentOutOfRangeException ex){sw.停止();}catch (TypeInitializationException ex){sw.停止();Console.WriteLine("TypeInitializationException:" + ex.Message);Console.WriteLine("InnerException: {0} - {1}",ex.InnerException.GetType().Name, ex.InnerException.Message);}Console.WriteLine("AppDomain 默认:{0}",AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));Console.WriteLine("秒表:" + sw.Elapsed);

使用带有无效(负)值的上述示例会导致抛出异常.处理它的代码将以下消息写入控制台:

<块引用>

TypeInitializationException:类型初始值设定项System.Text.RegularExpressions.Regex"引发异常.

InnerException: ArgumentOutOfRangeException - 指定的参数是超出有效值范围.参数名称:AppDomain 数据REGEX_DEFAULT_MATCH_TIMEOUT"包含无效的值或对象指定默认匹配超时System.Text.RegularExpressions.Regex.

在两个示例中都没有抛出 ArgumentOutOfRangeException.为了完整起见,代码显示了在使用新的 .NET 4.5 Regex 超时功能时可以处理的所有异常.

覆盖 AppDomain 默认值

覆盖 AppDomain 默认值是通过指定 matchTimeout 值来完成的.在下一个示例中,匹配在 2 秒后超时,而不是默认的 5 秒.

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",TimeSpan.FromSeconds(5));string input = "敏捷的棕色狐狸跳过懒狗.";字符串模式 = @"([a-z ]+)*!";var sw = Stopwatch.StartNew();尝试{var matchTimeout = TimeSpan.FromSeconds(2);bool 结果 = Regex.IsMatch(input, pattern,RegexOptions.None, matchTimeout);sw.停止();}catch (RegexMatchTimeoutException ex){sw.停止();Console.WriteLine("匹配超时!");Console.WriteLine("应用默认值:" + ex.MatchTimeout);}Console.WriteLine("AppDomain 默认:{0}",AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));Console.WriteLine("秒表:" + sw.Elapsed);

结束语

MSDN 建议在所有正则表达式模式匹配操作中设置超时值.但是,它们不会让您注意这样做时需要注意的问题.我不建议设置 AppDomain 默认值并结束它.您需要了解您的输入并了解您的模式.如果输入很大,或者模式很复杂,则应使用适当的超时值.这可能还需要测量您的关键执行正则表达式用法以分配合理的默认值.如果该值不够长,则任意将超时值分配给曾经可以正常工作的正则表达式可能会导致它中断.如果您认为它可能会过早地中止匹配尝试,请在分配值之前测量现有用法.

此外,此功能在处理用户提供的模式时很有用.然而,学习如何编写性能良好的正确模式很重要.对其进行暂停以弥补在正确模式构建方面的知识不足并不是一个好习惯.

There are times when being able to limit the pattern matching duration of regex operations could be useful. In particular, when working with user supplied patterns to match data, the pattern might exhibit poor performance due to nested quantifiers and excessive back-tracking (see catastrophic backtracking). One way to apply a timeout is to run the regex asynchronously, but this can be tedious and clutters the code.

According to what's new in the .NET Framework 4.5 Developer Preview it looks like there's a new built-in approach to support this:

Ability to limit how long the regular expression engine will attempt to resolve a regular expression before it times out.

How can I use this feature? Also, what do I need to be aware of when using it?

Note: I'm asking and answering this question since it's encouraged.

解决方案

I recently researched this topic since it interested me and will cover the main points here. The relevant MSDN documentation is available here and you can check out the Regex class to see the new overloaded constructors and static methods. The code samples can be run with Visual Studio 11 Developer Preview.

The Regex class accepts a TimeSpan to specify the timeout duration. You can specify a timeout at a macro and micro level in your application, and they can be used together:

  • Set the "REGEX_DEFAULT_MATCH_TIMEOUT" property using the AppDomain.SetData method (macro application-wide scope)
  • Pass the matchTimeout parameter (micro localized scope)

When the AppDomain property is set, all Regex operations will use that value as the default timeout. To override the application-wide default you simply pass a matchTimeout value to the regex constructor or static method. If an AppDomain default isn't set, and matchTimeout isn't specified, then pattern matching will not timeout (i.e., original pre-.NET 4.5 behavior).

There are 2 main exceptions to handle:

  • RegexMatchTimeoutException: thrown when a timeout occurs.
  • ArgumentOutOfRangeException: thrown when "matchTimeout is negative or greater than approximately 24 days." In addition, a TimeSpan value of zero will cause this to be thrown.

Despite negative values not being allowed, there's one exception: a value of -1 ms is accepted. Internally the Regex class accepts -1 ms, which is the value of the Regex.InfiniteMatchTimeout field, to indicate that a match should not timeout (i.e., original pre-.NET 4.5 behavior).

Using the matchTimeout parameter

In the following example I'll demonstrate both valid and invalid timeout scenarios and how to handle them:

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";
var timeouts = new[]
{
    TimeSpan.FromSeconds(4),     // valid
    TimeSpan.FromSeconds(-10)    // invalid
};

foreach (var matchTimeout in timeouts)
{
    Console.WriteLine("Input: " + matchTimeout);
    try
    {
        bool result = Regex.IsMatch(input, pattern,
                                    RegexOptions.None, matchTimeout);
    }
    catch (RegexMatchTimeoutException ex)
    {
        Console.WriteLine("Match timed out!");
        Console.WriteLine("- Timeout interval specified: " + ex.MatchTimeout);
        Console.WriteLine("- Pattern: " + ex.Pattern);
        Console.WriteLine("- Input: " + ex.Input);
    }
    catch (ArgumentOutOfRangeException ex)
    {
        Console.WriteLine(ex.Message);
    }
    Console.WriteLine();
}

When using an instance of the Regex class you have access to the MatchTimeout property:

string input = "The English alphabet has 26 letters";
string pattern = @"d+";
var matchTimeout = TimeSpan.FromMilliseconds(10);
var sw = Stopwatch.StartNew();
try
{
    var re = new Regex(pattern, RegexOptions.None, matchTimeout);
    bool result = re.IsMatch(input);
    sw.Stop();

    Console.WriteLine("Completed match in: " + sw.Elapsed);
    Console.WriteLine("MatchTimeout specified: " + re.MatchTimeout);
    Console.WriteLine("Matched with {0} to spare!",
                         re.MatchTimeout.Subtract(sw.Elapsed));
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine(ex.Message);
}

Using the AppDomain property

The "REGEX_DEFAULT_MATCH_TIMEOUT" property is used set an application-wide default:

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

If this property is set to an invalid TimeSpan value or an invalid object, a TypeInitializationException will be thrown when attempting to use a regex.

Example with a valid property value:

// AppDomain default set somewhere in your application
AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

// regex use elsewhere...
string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    // no timeout specified, defaults to AppDomain setting
    bool result = Regex.IsMatch(input, pattern);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}
catch (ArgumentOutOfRangeException ex)
{
    sw.Stop();
}
catch (TypeInitializationException ex)
{
    sw.Stop();
    Console.WriteLine("TypeInitializationException: " + ex.Message);
    Console.WriteLine("InnerException: {0} - {1}",
        ex.InnerException.GetType().Name, ex.InnerException.Message);
}
Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

Using the above example with an invalid (negative) value would cause the exception to be thrown. The code that handles it writes the following message to the console:

TypeInitializationException: The type initializer for 'System.Text.RegularExpressions.Regex' threw an exception.

InnerException: ArgumentOutOfRangeException - Specified argument was out of the range of valid values. Parameter name: AppDomain data 'REGEX_DEFAULT_MATCH_TIMEOUT' contains an invalid value or object for specifying a default matching timeout for System.Text.RegularExpressions.Regex.

In both examples the ArgumentOutOfRangeException isn't thrown. For completeness the code shows all the exceptions you can handle when working with the new .NET 4.5 Regex timeout feature.

Overriding AppDomain default

Overriding the AppDomain default is done by specifying a matchTimeout value. In the next example the match times out in 2 seconds instead of the default of 5 seconds.

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(5));

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    var matchTimeout = TimeSpan.FromSeconds(2);
    bool result = Regex.IsMatch(input, pattern,
                                RegexOptions.None, matchTimeout);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}

Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

Closing Remarks

MSDN recommends setting a time-out value in all regular expression pattern-matching operations. However, they don't draw your attention to issues to be aware of when doing so. I don't recommend setting an AppDomain default and calling it a day. You need to know your input and know your patterns. If the input is large, or the pattern is complex, an appropriate timeout value should be used. This might also entail measuring your critically performing regex usages to assign sane defaults. Arbitrarily assigning a timeout value to a regex that used to work fine may cause it to break if the value isn't long enough. Measure existing usages before assigning a value if you think it might abort the matching attempt too early.

Moreover, this feature is useful when handling user supplied patterns. Yet, learning how to write proper patterns that perform well is important. Slapping a timeout on it to make up for a lack of knowledge in proper pattern construction isn't good practice.

这篇关于如何超时 Regex 操作以防止挂在 .NET 4.5 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆