编译到代表表达的演出 [英] Performance of compiled-to-delegate Expression

查看:134
本文介绍了编译到代表表达的演出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我生成,从源对象映射属性目标对象的表达式树,随后被编译成 Func键< TSource,TDestination,TDestination> 和执行。



这是生成的 LambdaExpression 的调试视图:

  .Lambda#Lambda1< System.Func`3 [MemberMapper.Benchmarks.Program + ComplexSourceType,MemberMapper.Benchmarks.Program + ComplexDestinationType,MemberMapper.Benchmarks.Program + ComplexDestinationType]≥( 
MemberMapper.Benchmarks.Program + ComplexSourceType $权,
MemberMapper.Benchmarks.Program + ComplexDestinationType $左){
.Block(
MemberMapper.Benchmarks.Program + NestedSourceType $ $复杂955332131 ,
MemberMapper.Benchmarks.Program + NestedDestinationType $ $复杂二十一亿○五百七十〇万九千三百二十六){
$ left.ID = $ right.ID;
$ $复杂=九五五三三二一三一$ right.Complex;
$ $复杂=二十一亿〇五百七万九千三百二十六。新型MemberMapper.Benchmarks.Program + NestedDestinationType();
$ $复杂= 2105709326.ID复杂$ $ 955332131.ID;
$ $复杂= 2105709326.Name复杂$ $ 955332131.Name;
$ left.Complex = $ $复杂二十一亿○五百七万九千三百二十六;
$左
}
}



整理了这将是

 (左,右)= GT; 
{
left.ID = right.ID;
VAR complexSource = right.Complex;
变种complexDestination =新NestedDestinationType();
complexDestination.ID = complexSource.ID;
complexDestination.Name = complexSource.Name;
left.Complex = complexDestination;
返回离开;
}



就是这样的属性映射这些类型的代码:

 公共类NestedSourceType 
{
公众诠释ID {搞定;组; }
公共字符串名称{;组; }
}

公共类ComplexSourceType
{
公众诠释ID {搞定;组; }
公共NestedSourceType物{搞定;组; }
}

公共类NestedDestinationType
{
公众诠释ID {搞定;组; }
公共字符串名称{;组; }
}

公共类ComplexDestinationType
{
公众诠释ID {搞定;组; }
公共NestedDestinationType物{搞定;组; }
}



本手册的代码来做到这一点:

  VAR目的地=新ComplexDestinationType 
{
ID = source.ID,
群=新NestedDestinationType
{
ID = source.Complex.ID,
名称= source.Complex.Name
}
};



现在的问题是,当我编译 LambdaExpression 和基准所产生的代表是10倍左右比手动版本慢。我不知道这是为什么。 。而这个整体思路是没有手动映射的单调最高性能



当我把代码由Bart迪斯从他的blog帖子和基准手动版本计算质数与编译表达式树,它们在性能上完全一致。



什么可能导致这种巨大差异时的 LambdaExpression 看起来像你所期望的?<调试视图/ p>

修改



按照要求我加我用了基准:

 公共静态ComplexDestinationType富; 

静态无效基准()
{

变种映射器=新DefaultMemberMapper();

VAR地图= mapper.CreateMap(typeof运算(ComplexSourceType),
typeof运算(ComplexDestinationType))FinalizeMap()。

无功源=新ComplexSourceType
{
ID = 5,
群=新NestedSourceType
{
ID = 10,
产品名称=测试
}
};

变种SW = Stopwatch.StartNew();

的for(int i = 0; I< 1000000;我++)
{
美孚=新ComplexDestinationType
{
ID = source.ID +我,
群=新NestedDestinationType
{
ID = source.Complex.ID + I,
名称= source.Complex.Name
}
} ;
}

sw.Stop();

Console.WriteLine(sw.Elapsed);

sw.Restart();

的for(int i = 0; I< 1000000;我++)
{
美孚= mapper.Map< ComplexSourceType,ComplexDestinationType>(源);
}

sw.Stop();

Console.WriteLine(sw.Elapsed);

VAR FUNC =(Func键< ComplexSourceType,ComplexDestinationType,ComplexDestinationType>)
map.MappingFunction;

变种目的地=新ComplexDestinationType();

sw.Restart();

的for(int i = 0; I< 1000000;我++)
{
美孚= FUNC(源,新ComplexDestinationType());
}

sw.Stop();

Console.WriteLine(sw.Elapsed);
}



二是可以理解的不是做手工慢,因为它涉及到一个字典查找和几个对象实例化,但第三个应该是一样快,因为它是原始的代表那里的被调用,并从代表投Func键情况外循环。



我试过在包装功能的人工代码为好,但我记得,它并没有明显的区别。无论哪种方式,函数调用不应增加开销的一个数量级。



我也做了两次基准,以确保JIT是不会产生干扰。



修改



您可以得到这个项目在这里的代码:



https://github.com/JulianR/MemberMapper/



我用儿子-的打击调试器扩展在由Bart迪斯的博客文章中描述转储动态方法的产生IL:

  IL_0000:ldarg.2 
IL_0001:ldarg.1
IL_0002:callvirt 6000003 ComplexSourceType.get_ID()
IL_0007: callvirt 6000004 ComplexDestinationType.set_ID(Int32)已
IL_000c:ldarg.1
IL_000d:callvirt 6000005 ComplexSourceType.get_Complex()
IL_0012:brfalse IL_0043
IL_0017:ldarg.1
IL_0018:callvirt 6000006 ComplexSourceType.get_Complex()
IL_001d:stloc.0
IL_001e:newobj 6000007 NestedDestinationType..ctor()
IL_0023:stloc.1
IL_0024:ldloc。 1
IL_0025:ldloc.0
IL_0026:callvirt 6000008 NestedSourceType.get_ID()
IL_002b:callvirt 6000009 NestedDestinationType.set_ID(Int32)已
IL_0030:ldloc.1
IL_0031:ldloc.0
IL_0032:callvirt 600000a NestedSourceType.get_Name()
IL_0037:callvirt 600000b NestedDestinationType.set_Name(System.String)
IL_003c:ldarg.2
IL_003d:ldloc .1
IL_003e:callvirt 600000c ComplexDestinationType.set_Complex(NestedDestinationType)
IL_0043:ldarg.2
IL_0044:RET

我在IL专家,但这个似乎很straightfoward和你所期望到底是什么,不是吗?那么为什么会这么慢?没有怪异的装箱操作,没有任何隐藏的实例,什么都没有。这不完全一样上面的表达式树也有一个检查 right.Complex 了。



这是(通过反射获得)的手动版本的代码:

  L_0000:ldarg.1 
L_0001:ldarg.0
L_0002:callvirt例如INT32 ComplexSourceType :: get_ID()
L_0007:callvirt实例无效ComplexDestinationType :: set_ID(INT32)
L_000c :ldarg.0
L_000d:callvirt实例类NestedSourceType ComplexSourceType :: get_Complex()
L_0012:brfalse.s L_0040
L_0014:ldarg.0
L_0015:callvirt实例类NestedSourceType ComplexSourceType :: get_Complex()
L_001a:stloc.0
L_001b:newobj实例无效NestedDestinationType ::构造函数()
L_0020:stloc.1
L_0021:ldloc.1
L_0022:ldloc.0
L_0023:callvirt例如INT32 NestedSourceType :: get_ID()
L_0028:callvirt实例无效NestedDestinationType :: set_ID(INT32)
L_002d:ldloc.1
L_002e:ldloc.0
L_002f:callvirt比如字符串NestedSourceType :: GET_NAME()
L_0034:callvirt实例无效NestedDestinationType :: set_Name(字符串)
L_0039:ldarg.1
L_003a:ldloc.1
L_003b:callvirt实例无效ComplexDestinationType :: set_Complex(类NestedDestinationType)
L_0040:ldarg.1
L_0041:RET

看起来与我..



修改



我跟着迈克尔·b的关于这个话题的答案的链接。我试图在接受的答案执行的伎俩,它的工作!如果你想招的总结:它创建一个动态组装和编译表达式树成在组装一个静态方法,由于某种原因这是速度快10倍。一个缺点,这是我的标杆类是内部的(实际上,嵌套在一个内部的公共类),它抛出一个异常,当我尝试,因为他们不能访问来访问它们。似乎没有成为一个解决办法,但我可以简单地或检测,如果引用的类型是内部不和决定使用哪种方法来编写。



还有什么错误我虽然是为什么素数的方法的的在性能上编译表达式树相同。



和再次,我欢迎任何人在那个GitHub的库运行的代码,以确认我的测量,以确保我不是疯了:)


解决方案

这是对于这样一个庞大的偷听非常奇怪。有几件事情要考虑。首先VS编译的代码应用了不同的属性可能影响抖动优化不同。



您包括首次执行对这些结果编译委托?你不应该,你应该忽略任何代码路径的第一次执行。你也应该把正常的代码放到一个委托作为委托调用是比调用一个实例方法,这比调用一个静态方法较慢稍微慢一些。



至于其他变化也有一些是考虑到该编译委托具有这是不被用在这里,但表示这是可能执行慢一点有针对性委托的闭包对象的事实。你会注意到编译委托有一个目标对象和所有参数由一个下移。



同一个LCG生成的方法被认为是静态倾向于编译为代表比,因为寄存器交换业务实例的方法要慢一些。 (达菲说,this指针在CLR保留寄存器,当你有一个委托静态它必须转移到一个不同的寄存器调用轻微的开销)。
最后,在运行时生成的代码似乎运行比VS.生成的代码稍慢在运行时生成的代码似乎有额外的沙盒,并从不同的装配发射(尝试使用类似ldftn操作码或愈伤码,如果你不相信我,那些reflection.emited代表将编译,但不会让你真正执行它们),它调用一个最小的开销。



另外你在释放模式运行的权利?
有类似的主题,我们看了看这个问题就在这里:
为什么Func键<> Func键<从表达式来创建>>比Func键<慢;>直接宣布



编辑:
也看到我的答案在这里:
DynamicMethod的是比编译IL功能慢得多



主要的外卖是,你应该将以下代码添加到您计划创建并调用运行时生成的代码的程序集。

  [汇编:AllowPartiallyTrustedCallers] 
[装配:SecurityTransparent]
[总成:SecurityRules(SecurityRuleSet.Level2,SkipVerificationInFullTrust = TRUE)]

和始终使用内置的委托类型或一个来自这些标志的装配。



原因是,匿名动态代码是在将始终标记为部分信任的程序集托管。通过允许部分受信任的调用者可以跳过握手的一部分。透明度意味着你的代码是不会提高安全级别(即慢行),最后真正的技巧是调用作为跳过验证一个标记为组装主办的委托类型。 Func键< INT,INT> #Invoke 是完全可信的,所以不需要验证。这会给你从VS编译器生成代码的性能。如果不使用这些属性的您正在寻找在.NET 4架空你可能认为SecurityRuleSet.Level1是避免这一开销的好方法,但交换的安全模型也是昂贵的。



总之,添加这些属性,然后你的微循环性能测试,将运行大约相同。


I'm generating an expression tree that maps properties from a source object to a destination object, that is then compiled to a Func<TSource, TDestination, TDestination> and executed.

This is the debug view of the resulting LambdaExpression:

.Lambda #Lambda1<System.Func`3[MemberMapper.Benchmarks.Program+ComplexSourceType,MemberMapper.Benchmarks.Program+ComplexDestinationType,MemberMapper.Benchmarks.Program+ComplexDestinationType]>(
    MemberMapper.Benchmarks.Program+ComplexSourceType $right,
    MemberMapper.Benchmarks.Program+ComplexDestinationType $left) {
    .Block(
        MemberMapper.Benchmarks.Program+NestedSourceType $Complex$955332131,
        MemberMapper.Benchmarks.Program+NestedDestinationType $Complex$2105709326) {
        $left.ID = $right.ID;
        $Complex$955332131 = $right.Complex;
        $Complex$2105709326 = .New MemberMapper.Benchmarks.Program+NestedDestinationType();
        $Complex$2105709326.ID = $Complex$955332131.ID;
        $Complex$2105709326.Name = $Complex$955332131.Name;
        $left.Complex = $Complex$2105709326;
        $left
    }
}

Cleaned up it would be:

(left, right) =>
{
    left.ID = right.ID;
    var complexSource = right.Complex;
    var complexDestination = new NestedDestinationType();
    complexDestination.ID = complexSource.ID;
    complexDestination.Name = complexSource.Name;
    left.Complex = complexDestination;
    return left;
}

That's the code that maps the properties on these types:

public class NestedSourceType
{
  public int ID { get; set; }
  public string Name { get; set; }
}

public class ComplexSourceType
{
  public int ID { get; set; }
  public NestedSourceType Complex { get; set; }
}

public class NestedDestinationType
{
  public int ID { get; set; }
  public string Name { get; set; }
}

public class ComplexDestinationType
{
  public int ID { get; set; }
  public NestedDestinationType Complex { get; set; }
}

The manual code to do this is:

var destination = new ComplexDestinationType
{
  ID = source.ID,
  Complex = new NestedDestinationType
  {
    ID = source.Complex.ID,
    Name = source.Complex.Name
  }
};

The problem is that when I compile the LambdaExpression and benchmark the resulting delegate it is about 10x slower than the manual version. I have no idea why that is. And the whole idea about this is maximum performance without the tedium of manual mapping.

When I take code by Bart de Smet from his blog post on this topic and benchmark the manual version of calculating prime numbers versus the compiled expression tree, they are completely identical in performance.

What can cause this huge difference when the debug view of the LambdaExpression looks like what you would expect?

EDIT

As requested I added the benchmark I used:

public static ComplexDestinationType Foo;

static void Benchmark()
{

  var mapper = new DefaultMemberMapper();

  var map = mapper.CreateMap(typeof(ComplexSourceType),
                             typeof(ComplexDestinationType)).FinalizeMap();

  var source = new ComplexSourceType
  {
    ID = 5,
    Complex = new NestedSourceType
    {
      ID = 10,
      Name = "test"
    }
  };

  var sw = Stopwatch.StartNew();

  for (int i = 0; i < 1000000; i++)
  {
    Foo = new ComplexDestinationType
    {
      ID = source.ID + i,
      Complex = new NestedDestinationType
      {
        ID = source.Complex.ID + i,
        Name = source.Complex.Name
      }
    };
  }

  sw.Stop();

  Console.WriteLine(sw.Elapsed);

  sw.Restart();

  for (int i = 0; i < 1000000; i++)
  {
    Foo = mapper.Map<ComplexSourceType, ComplexDestinationType>(source);
  }

  sw.Stop();

  Console.WriteLine(sw.Elapsed);

  var func = (Func<ComplexSourceType, ComplexDestinationType, ComplexDestinationType>)
             map.MappingFunction;

  var destination = new ComplexDestinationType();

  sw.Restart();

  for (int i = 0; i < 1000000; i++)
  {
    Foo = func(source, new ComplexDestinationType());
  }

  sw.Stop();

  Console.WriteLine(sw.Elapsed);
}

The second one is understandably slower than doing it manually as it involves a dictionary lookup and a few object instantiations, but the third one should be just as fast as it's the raw delegate there that's being invoked and the cast from Delegate to Func happens outside the loop.

I tried wrapping the manual code in a function as well, but I recall that it didn't make a noticeable difference. Either way, a function call shouldn't add an order of magnitude of overhead.

I also do the benchmark twice to make sure the JIT isn't interfering.

EDIT

You can get the code for this project here:

https://github.com/JulianR/MemberMapper/

I used the Sons-of-Strike debugger extension as described in that blog post by Bart de Smet to dump the generated IL of the dynamic method:

IL_0000: ldarg.2 
IL_0001: ldarg.1 
IL_0002: callvirt 6000003 ComplexSourceType.get_ID()
IL_0007: callvirt 6000004 ComplexDestinationType.set_ID(Int32)
IL_000c: ldarg.1 
IL_000d: callvirt 6000005 ComplexSourceType.get_Complex()
IL_0012: brfalse IL_0043
IL_0017: ldarg.1 
IL_0018: callvirt 6000006 ComplexSourceType.get_Complex()
IL_001d: stloc.0 
IL_001e: newobj 6000007 NestedDestinationType..ctor()
IL_0023: stloc.1 
IL_0024: ldloc.1 
IL_0025: ldloc.0 
IL_0026: callvirt 6000008 NestedSourceType.get_ID()
IL_002b: callvirt 6000009 NestedDestinationType.set_ID(Int32)
IL_0030: ldloc.1 
IL_0031: ldloc.0 
IL_0032: callvirt 600000a NestedSourceType.get_Name()
IL_0037: callvirt 600000b NestedDestinationType.set_Name(System.String)
IL_003c: ldarg.2 
IL_003d: ldloc.1 
IL_003e: callvirt 600000c ComplexDestinationType.set_Complex(NestedDestinationType)
IL_0043: ldarg.2 
IL_0044: ret 

I'm no expert at IL, but this seems pretty straightfoward and exactly what you would expect, no? Then why is it so slow? No weird boxing operations, no hidden instantiations, nothing. It's not exactly the same as expression tree above as there's also a null check on right.Complex now.

This is the code for the manual version (obtained through Reflector):

L_0000: ldarg.1 
L_0001: ldarg.0 
L_0002: callvirt instance int32 ComplexSourceType::get_ID()
L_0007: callvirt instance void ComplexDestinationType::set_ID(int32)
L_000c: ldarg.0 
L_000d: callvirt instance class NestedSourceType ComplexSourceType::get_Complex()
L_0012: brfalse.s L_0040
L_0014: ldarg.0 
L_0015: callvirt instance class NestedSourceType ComplexSourceType::get_Complex()
L_001a: stloc.0 
L_001b: newobj instance void NestedDestinationType::.ctor()
L_0020: stloc.1 
L_0021: ldloc.1 
L_0022: ldloc.0 
L_0023: callvirt instance int32 NestedSourceType::get_ID()
L_0028: callvirt instance void NestedDestinationType::set_ID(int32)
L_002d: ldloc.1 
L_002e: ldloc.0 
L_002f: callvirt instance string NestedSourceType::get_Name()
L_0034: callvirt instance void NestedDestinationType::set_Name(string)
L_0039: ldarg.1 
L_003a: ldloc.1 
L_003b: callvirt instance void ComplexDestinationType::set_Complex(class NestedDestinationType)
L_0040: ldarg.1 
L_0041: ret 

Looks identical to me..

EDIT

I followed the link in Michael B's answer about this topic. I tried implementing the trick in the accepted answer and it worked! If you want a summary of the trick: it creates a dynamic assembly and compiles the expression tree into a static method in that assembly and for some reason that's 10x faster. A downside to this is that my benchmark classes were internal (actually, public classes nested in an internal one) and it threw an exception when I tried to access them because they weren't accessible. There doesn't seem to be a workaround that, but I can simply detect if the types referenced are internal or not and decide which approach to compilation to use.

What still bugs me though is why that prime numbers method is identical in performance to the compiled expression tree.

And again, I welcome anyone to run the code at that GitHub repository to confirm my measurements and to make sure I'm not crazy :)

解决方案

This is pretty strange for such a huge overheard. There are a few things to take into account. First the VS compiled code has different properties applied to it that might influence the jitter to optimize differently.

Are you including the first execution for the compiled delegate in these results? You shouldn't, you should ignore the first execution of either code path. You should also turn the normal code into a delegate as delegate invocation is slightly slower than invoking an instance method, which is slower than invoking a static method.

As for other changes there is something to account for the fact that the compiled delegate has a closure object which isn't being used here but means that this is a targeted delegate which might perform a bit slower. You'll notice the compiled delegate has a target object and all the arguments are shifted down by one.

Also methods generated by lcg are considered static which tend to be slower when compiled to delegates than instance methods because of register switching business. (Duffy said that the "this" pointer has a reserved register in CLR and when you have a delegate for a static it has to be shifted to a different register invoking a slight overhead). Finally, code generated at runtime seems to run slightly slower than code generated by VS. Code generated at runtime seems to have extra sandboxing and is launched from a different assembly (try using something like ldftn opcode or calli opcode if you don't believe me, those reflection.emited delegates will compile but won't let you actually execute them) which invokes a minimal overhead.

Also you are running in release mode right? There was a similar topic where we looked over this problem here: Why is Func<> created from Expression<Func<>> slower than Func<> declared directly?

Edit: Also see my answer here: DynamicMethod is much slower than compiled IL function

The main takeaway is that you should add the following code to the assembly where you plan to create and invoke run-time generated code.

[assembly: AllowPartiallyTrustedCallers]
[assembly: SecurityTransparent]
[assembly: SecurityRules(SecurityRuleSet.Level2,SkipVerificationInFullTrust=true)]

And to always use a built-in delegate type or one from an assembly with those flags.

The reason being that anonymous dynamic code is hosted in an assembly that is always marked as partial trust. By allowing partially trusted callers you can skip part of the handshake. The transparency means that your code is not going to raise the security level (i.e. slow behavior), And finally the real trick is to invoke a delegate type hosted in an assembly that is marked as skip verification. Func<int,int>#Invoke is fully trusted, so no verification is needed. This will give you performance of code generated from the VS compiler. By not using these attributes you are looking at an overhead in .NET 4. You might think that SecurityRuleSet.Level1 would be a good way to avoid this overhead, but switching security models is also expensive.

In short, add those attributes, and then your micro-loop performance test, will run about the same.

这篇关于编译到代表表达的演出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆