RyuJIT没有充分利用SIMD内在函数 [英] RyuJIT not making full use of SIMD intrinsics
问题描述
我正在运行一些使用System.Numerics.Vector<T>
的C#代码,但据我所知,我没有充分利用SIMD内部函数的好处.我正在使用Visual Studio Community 2015和Update 1,我的clrjit.dll是v4.6.1063.1.
I'm running some C# code that uses System.Numerics.Vector<T>
but as far as I can tell I'm not getting the full benefit of SIMD intrinsics. I'm using Visual Studio Community 2015 with Update 1, and my clrjit.dll is v4.6.1063.1.
我正在 Intel Core i5-3337U处理器,它实现了AVX指令集扩展.因此,我认为,我应该能够在256位寄存器上执行大多数SIMD指令.例如,反汇编中应包含诸如vmovups
,vmovupd
,vaddups
等的指令,并且Vector<float>.Count
应该返回8,Vector<double>.Count
应该为4,等等...但这不是我的意思.在看.
I'm running on an Intel Core i5-3337U Processor, which implements the AVX instruction set extensions. Therefore, I figure, I should be able to execute most SIMD instructions on a 256 bit register. For example, the disassembly should contain instructions like vmovups
, vmovupd
, vaddups
, etc..., and Vector<float>.Count
should return 8, Vector<double>.Count
should be 4, etc... But that's not what I'm seeing.
相反,我的反汇编包含诸如movups
,movupd
,addups
等的说明以及以下代码:
Instead my disassembly contains instructions like movups
, movupd
, addups
, etc... and the following code:
WriteLine($"{Vector<byte>.Count} bytes per operation");
WriteLine($"{Vector<float>.Count} floats per operation");
WriteLine($"{Vector<int>.Count} ints per operation");
WriteLine($"{Vector<double>.Count} doubles per operation");
产生:
16 bytes per operation
4 floats per operation
4 ints per operation
2 doubles per operation
我要去哪里错了?要查看所有项目设置等,可以在此处使用该项目.
Where am I going wrong? To see all project settings etc. the project is available here.
推荐答案
您的处理器有些陈旧,其微体系结构是Ivy Bridge.桑迪桥(Sandy Bridge)的特克(tock)"功能在不进行架构更改的情况下会缩小.您的宿敌是RyuJIT中的这段代码,位于ee_il_dll.cpp ,CILJit :: getMaxIntrinsicSIMDVectorLength()函数:
Your processor is a bit dated, its micro-architecture is Ivy Bridge. The "tock" of Sandy Bridge, a feature shrink without architectural changes. Your nemesis is this bit of code in RyuJIT, located in ee_il_dll.cpp, CILJit::getMaxIntrinsicSIMDVectorLength() function:
if (((cpuCompileFlags & CORJIT_FLG_PREJIT) == 0) &&
((cpuCompileFlags & CORJIT_FLG_FEATURE_SIMD) != 0) &&
((cpuCompileFlags & CORJIT_FLG_USE_AVX2) != 0))
{
static ConfigDWORD fEnableAVX;
if (fEnableAVX.val(CLRConfig::EXTERNAL_EnableAVX) != 0)
{
return 32;
}
}
请注意CORJIT_FLG_USE_AVX2的使用.您的处理器尚不支持AVX2,该扩展已在Haswell中可用. Ivy Bridge之后的下一个微体系结构,即滴答".顺便说一句,非常好的处理器,像这个这样的发现有一个很大的哇.
Note the use of CORJIT_FLG_USE_AVX2. Your processor does not support AVX2 yet, that extension became available in Haswell. The next micro-architecture after Ivy Bridge, a "tick". Very nice processor btw, discoveries like this one have a major wow factor.
除了购物,您无能为力.为了获得启发,您可以查看它在这篇文章中生成的代码类型.
Nothing you can do about this but go shopping. For inspiration, you can look at the kind of code it generates in this post.
这篇关于RyuJIT没有充分利用SIMD内在函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!