为什么内核中不使用SIMD指令? [英] Why are SIMD instructions not used in kernel?

查看:274
本文介绍了为什么内核中不使用SIMD指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在内核中找不到太多的SIMD指令(例如SSE/AVX)使用(一个地方用来加速RAID6的奇偶校验计算).

I couldn't find much use of SIMD instructions (like SSE/AVX) in kernel (except one place where they were used to speedup parity computation of RAID6).

Q1)是否有任何特定原因,或者只是缺少用例?

Q1) Any specific reason for this or just the lack of use-case?

Q2)如果要使用SIMD指令(例如设备驱动程序),今天需要做什么?

Q2) What needs to be done today if I want to use SIMD instruction, in say a device driver?

Q3)将ISPC之类的框架整合到内核中(仅用于实验)有多困难?

Q3) How hard will it be to incorporate framework like ISPC into kernel (just for experimentation)?

推荐答案

保存/恢复FPU(包括SIMD矢量寄存器)状态要比仅整数GP寄存器状态贵.在大多数情况下,这根本不值得付出这笔费用.

Saving/restoring FPU (including SIMD vector registers) state is more expensive than just integer GP register state. It's simply not worth the cost in most cases.

在Linux内核代码中,您要做的就是在代码周围调用kernel_fpu_begin()/kernel_fpu_end(). RAID驱动程序就是这样做的.请参见 http://yarchive.net/comp/linux/kernel_fp.html .

In Linux kernel code, all you have to do is call kernel_fpu_begin() / kernel_fpu_end() around your code. This is what the RAID drivers do. See http://yarchive.net/comp/linux/kernel_fp.html.

x86没有保存或恢复一个或几个向量寄存器的任何面向未来的方法. (除了使用旧版SSE指令手动保存/恢复xmm寄存器外,可能导致

x86 doesn't have any future-proof way to save/restore one or a couple vector registers. (Other than manual save/restore of an xmm register using legacy SSE instructions, potentially causing SSE/AVX transition stalls on Intel CPUs if user-space had the upper halves of any ymm/zmm registers dirty).

旧版SSE起作用的原因是,当英特尔想要引入AVX时,某些Windows驱动程序已经在执行此操作,因此他们发明了过渡惩罚性的东西,而不是使用旧版SSE指令将ymm寄存器的高128b归零. (有关该设计决策的更多信息,请参见.)因此,基本上,我们可以将SSE/归咎于Windows二进制驱动程序/AVX过渡处罚混乱.

The reason legacy SSE works is that some Windows drivers were already doing this when Intel wanted to introduce AVX, so they invented that transition-penalty stuff instead of having legacy SSE instructions zero the upper 128b of ymm registers. (See this for more detail on that design decision.) So basically we can blame Windows binary-only drivers for the SSE/AVX transition-penalty mess.

IDK,以及现有SIMD指令集是否具有面向未来的方式来保存/恢复寄存器,该寄存器将继续适用于更长的向量.如果扩展继续,ARM32可能会采用将多个32位FP寄存器用作单个较宽寄存器的模式. (例如,q2s8s11组成.)因此,如果256b NEON扩展仅允许您将2个q寄存器用作一个256b,则保存/恢复几个q寄存器应该是面向未来的.登记.或者,如果新的更宽的向量是分开的,并且不扩展现有的寄存器.

IDK about non-x86 architectures, and whether the existing SIMD instruction sets have a future-proof way to save/restore a register that will continue to work for longer vectors. ARM32 might, if extensions continue the pattern of using multiple 32-bit FP registers as single wider register. (e.g. q2 is composed of s8 through s11.) So saving/restoring a couple q registers should be future-proof, if a 256b NEON extension simply lets you use 2 q registers as one 256b register. Or if the new wider vectors are separate, and don't extend the existing registers.

这篇关于为什么内核中不使用SIMD指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆