为什么正则表达式CompileToAssembly给较慢的性能比编译正则表达式和解释的正则表达式? [英] Why Regex CompileToAssembly giving slower performance than compiled regex and Interpreted Regex?

查看:151
本文介绍了为什么正则表达式CompileToAssembly给较慢的性能比编译正则表达式和解释的正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用下面的代码来测试对正则表达式的编译性能CompileToAssembly但结果是不恰当的。请让我知道我错过了什么。谢谢!



 静态只读正则表达式的regex =新的正则表达式(@(统计| pause\s(所有?| \\ \\d +(\,\d +)*)| start\s?(全部| \d +(\,\d +)*)| add\s time\s(所有?|?\ D +(\,\d +)*)(\s\d +)| C(:焊割)\s p(:?????asskey)|关闭)(*),RegexOptions.Compiled。 ); 
静态只读正则表达式章=新的正则表达式(@(统计| pause\s(所有|?\d +(\,\d +)*)| start\s(所有|?\d + (\,\d +)*)|?add\s time\s?(全部| \d +(\,\d +)*)(\s\d +)| C(?:焊割)\s p(:asskey)|关闭)(*));????。
静态只读正则表达式级别4 =新DuplicatedString();

静态无效的主要()
{
常量字符串str =加入时间243,3453,43543,543,534534,54534543,345345,4354354235,345435,34543534 6873brekgnfkjerkgiengklewrij ;
const int的ITR = 1000000;
CompileToAssembly();
匹配匹配;
秒表SW =新的秒表();
sw.Start();
的for(int i = 0; I< ITR;我++)
{
匹配= regex.Match(STR);
}
sw.Stop();
Console.WriteLine(RegexOptions.Compiled:{0}毫秒,sw.ElapsedMilliseconds);

sw.Reset();
sw.Start();
的for(int i = 0; I< ITR;我++)
{
匹配= level4.Match(STR);
}
sw.Stop();

Console.WriteLine(CompiledToAssembly:{0}毫秒,sw.ElapsedMilliseconds);

sw.Reset();
sw.Start();
的for(int i = 0; I< ITR;我++)
{
匹配= reg.Match(STR);
}
sw.Stop();
Console.WriteLine(释:{0}毫秒,sw.ElapsedMilliseconds);
到Console.ReadLine();
}

公共静态无效CompileToAssembly()
{
RegexCompilationInfo expr的;
名单,LT; RegexCompilationInfo> compilationList =新的List< RegexCompilationInfo>();

//定义正则表达式来检测重复的单词
EXPR =新RegexCompilationInfo(@(统计| pause\s(所有|?\d +(\,\d +) *)| start\s?(全部| \d +(\,\d +)*)|?add\s time\s?(全部| \d +(\,\d +)* )(\s\d +)| C(?:焊割)\s p(:????asskey)|关闭)(*),
RegexOptions.Compiled,
DuplicatedString,
Utilities.RegularExpressions,
真);
// info对象添加到对象
compilationList.Add(表达式)名单;

//应用AssemblyTitle归因于新的装配
//
//定义AssemblyTitle属性的构造
类型的参数(S)[] =参数{ typeof运算(字符串)};
//定义组装的标题
[对象] paramValues = {编译的正则表达式的通用库};
//获取表示属性的构造
ConstructorInfo男星= typeof运算(System.Reflection.AssemblyTitleAttribute).GetConstructor(参数)的ConstructorInfo对象;
//创建CustomAttributeBuilder对象数组
CustomAttributeBuilder [] = attBuilder新{CustomAttributeBuilder(构造函数,paramValues)};

//生成编译的正则表达式$ B $程序集B RegexCompilationInfo [] = compilationArray新RegexCompilationInfo [compilationList.Count]
的AssemblyName assemName =新的AssemblyName(RegexLib,版本= 1.0.0.1001,文化=中立,公钥=空);
compilationList.CopyTo(compilationArray);
Regex.CompileToAssembly(compilationArray,assemName,attBuilder);
}



以下是结果:

  RegexOptions.Compiled:3908ms 
CompiledToAssembly:59349ms
解释的:5653ms


解决方案

您的代码有一个问题:静态字段初始运行的的静态方法运行。这意味着,级别4 的Main()之前分配运行。这意味着该对象被称为级别4 终止的的在创建的类CompileToAssembly()的一个实例



请注意,对于 Regex.CompileToAssembly 示例代码显示了正则表达式和汇编其消费的两个不同的程序。的你正在为CompiledToAssembly定时实际正则表达式因此可以您在早期的测试中编制了不同的正则表达式。



要考虑的另一个因素:加载组装成内存并将其jitting成机器代码的开销可能就足够了,你需要超过100万次迭代看到效益显著


I am using the following code to test CompileToAssembly performance against compiled regex but the results are not appropriate. Please let me know what am I missing. Thanks!!!

static readonly Regex regex = new Regex(@"(stats|pause\s?(all|\d+(\,\d+)*)|start\s?(all|\d+(\,\d+)*)|add\s?time\s?(all|\d+(\,\d+)*)(\s\d+)|c(?:hange)?\s?p(?:asskey)?|close)(.*)", RegexOptions.Compiled);
static readonly Regex reg = new Regex(@"(stats|pause\s?(all|\d+(\,\d+)*)|start\s?(all|\d+(\,\d+)*)|add\s?time\s?(all|\d+(\,\d+)*)(\s\d+)|c(?:hange)?\s?p(?:asskey)?|close)(.*)");
static readonly Regex level4 = new DuplicatedString();

    static void Main()
    {
        const string str = "add time 243,3453,43543,543,534534,54534543,345345,4354354235,345435,34543534 6873brekgnfkjerkgiengklewrij";
        const int itr = 1000000;
        CompileToAssembly();
        Match match;
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < itr; i++)
        {
             match = regex.Match(str);
        }
        sw.Stop();
        Console.WriteLine("RegexOptions.Compiled: {0}ms", sw.ElapsedMilliseconds);

        sw.Reset();
        sw.Start();
        for (int i = 0; i < itr; i++)
        {
            match = level4.Match(str);
        }
        sw.Stop();

        Console.WriteLine("CompiledToAssembly: {0}ms", sw.ElapsedMilliseconds);

        sw.Reset();
        sw.Start();
        for (int i = 0; i < itr; i++)
        {
            match = reg.Match(str);
        }
        sw.Stop();
        Console.WriteLine("Interpreted: {0}ms", sw.ElapsedMilliseconds);
        Console.ReadLine();
    }

    public static void CompileToAssembly()
    {
        RegexCompilationInfo expr;
        List<RegexCompilationInfo> compilationList = new List<RegexCompilationInfo>();

        // Define regular expression to detect duplicate words
        expr = new RegexCompilationInfo(@"(stats|pause\s?(all|\d+(\,\d+)*)|start\s?(all|\d+(\,\d+)*)|add\s?time\s?(all|\d+(\,\d+)*)(\s\d+)|c(?:hange)?\s?p(?:asskey)?|close)(.*)",
                   RegexOptions.Compiled,
                   "DuplicatedString",
                   "Utilities.RegularExpressions",
                   true);
        // Add info object to list of objects
        compilationList.Add(expr);

        // Apply AssemblyTitle attribute to the new assembly
        //
        // Define the parameter(s) of the AssemblyTitle attribute's constructor 
        Type[] parameters = { typeof(string) };
        // Define the assembly's title
        object[] paramValues = { "General-purpose library of compiled regular expressions" };
        // Get the ConstructorInfo object representing the attribute's constructor
        ConstructorInfo ctor = typeof(System.Reflection.AssemblyTitleAttribute).GetConstructor(parameters);
        // Create the CustomAttributeBuilder object array
        CustomAttributeBuilder[] attBuilder = { new CustomAttributeBuilder(ctor, paramValues) };

        // Generate assembly with compiled regular expressions
        RegexCompilationInfo[] compilationArray = new RegexCompilationInfo[compilationList.Count];
        AssemblyName assemName = new AssemblyName("RegexLib, Version=1.0.0.1001, Culture=neutral, PublicKeyToken=null");
        compilationList.CopyTo(compilationArray);
        Regex.CompileToAssembly(compilationArray, assemName, attBuilder);
    }

following are the results:

RegexOptions.Compiled: 3908ms
CompiledToAssembly: 59349ms
Interpreted: 5653ms

解决方案

Your code has a problem: static field initializers will run before static methods run. That means that level4 has already been assigned before Main() runs. This means that the object referred to by level4 is not an instance of the class created in CompileToAssembly().

Note that the example code for Regex.CompileToAssembly shows the compilation of the regex and its consumption in two different programs. The actual regex you're timing as "CompiledToAssembly" could therefore be a different regex that you compiled in an earlier test.

Another factor to consider: the overhead of loading an assembly into memory and jitting it to machine code might be significant enough that you need more than 1,000,000 iterations to see a benefit.

这篇关于为什么正则表达式CompileToAssembly给较慢的性能比编译正则表达式和解释的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆