RyuJIT没有充分利用SIMD内在函数 [英] RyuJIT not making full use of SIMD intrinsics

查看:138
本文介绍了RyuJIT没有充分利用SIMD内在函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一些使用System.Numerics.Vector<T>的C#代码,但据我所知,我没有充分利用SIMD内部函数的好处.我正在使用Visual Studio Community 2015和Update 1,我的clrjit.dll是v4.6.1063.1.

I'm running some C# code that uses System.Numerics.Vector<T> but as far as I can tell I'm not getting the full benefit of SIMD intrinsics. I'm using Visual Studio Community 2015 with Update 1, and my clrjit.dll is v4.6.1063.1.

我正在 Intel Core i5-3337U处理器,它实现了AVX指令集扩展.因此,我认为,我应该能够在256位寄存器上执行大多数SIMD指令.例如,反汇编中应包含诸如vmovupsvmovupdvaddups等的指令,并且Vector<float>.Count应该返回8,Vector<double>.Count应该为4,等等...但这不是我的意思.在看.

I'm running on an Intel Core i5-3337U Processor, which implements the AVX instruction set extensions. Therefore, I figure, I should be able to execute most SIMD instructions on a 256 bit register. For example, the disassembly should contain instructions like vmovups, vmovupd, vaddups, etc..., and Vector<float>.Count should return 8, Vector<double>.Count should be 4, etc... But that's not what I'm seeing.

相反,我的反汇编包含诸如movupsmovupdaddups等的说明以及以下代码:

Instead my disassembly contains instructions like movups, movupd, addups, etc... and the following code:

WriteLine($"{Vector<byte>.Count} bytes per operation");
WriteLine($"{Vector<float>.Count} floats per operation");
WriteLine($"{Vector<int>.Count} ints per operation");
WriteLine($"{Vector<double>.Count} doubles per operation");

产生:

16 bytes per operation
4 floats per operation
4 ints per operation
2 doubles per operation

我要去哪里错了?要查看所有项目设置等,可以在此处使用该项目.

Where am I going wrong? To see all project settings etc. the project is available here.

推荐答案

您的处理器有些陈旧,其微体系结构是Ivy Bridge.桑迪桥(Sandy Bridge)的特克(tock)"功能在不进行架构更改的情况下会缩小.您的宿敌是RyuJIT中的这段代码,位于ee_il_dll.cpp ,CILJit :: getMaxIntrinsicSIMDVectorLength()函数:

Your processor is a bit dated, its micro-architecture is Ivy Bridge. The "tock" of Sandy Bridge, a feature shrink without architectural changes. Your nemesis is this bit of code in RyuJIT, located in ee_il_dll.cpp, CILJit::getMaxIntrinsicSIMDVectorLength() function:

if (((cpuCompileFlags & CORJIT_FLG_PREJIT) == 0) &&
    ((cpuCompileFlags & CORJIT_FLG_FEATURE_SIMD) != 0) &&
    ((cpuCompileFlags & CORJIT_FLG_USE_AVX2) != 0))
{
    static ConfigDWORD fEnableAVX;
    if (fEnableAVX.val(CLRConfig::EXTERNAL_EnableAVX) != 0)
    {
        return 32;
    }
}

请注意CORJIT_FLG_USE_AVX2的使用.您的处理器尚不支持AVX2,该扩展已在Haswell中可用. Ivy Bridge之后的下一个微体系结构,即滴答".顺便说一句,非常好的处理器,像这个这样的发现有一个很大的哇.

Note the use of CORJIT_FLG_USE_AVX2. Your processor does not support AVX2 yet, that extension became available in Haswell. The next micro-architecture after Ivy Bridge, a "tick". Very nice processor btw, discoveries like this one have a major wow factor.

除了购物,您无能为力.为了获得启发,您可以查看它在这篇文章中生成的代码类型.

Nothing you can do about this but go shopping. For inspiration, you can look at the kind of code it generates in this post.

这篇关于RyuJIT没有充分利用SIMD内在函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆