ARM 和 NEON 可以并行工作吗? [英] ARM and NEON can work in parallel?

查看:26
本文介绍了ARM 和 NEON 可以并行工作吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是参考问题:Neon Intrinsics 的校验和代码实现

将链接中列出的子问题作为单独的单独问题打开.因为不能在单线程中提出多个问题.

Opening the sub-questions listed in the link as separate individual questions. As multi questions aren't to be asked as a part of single thread.

无论如何都要回答这个问题:

Anyway coming to the question:

ARMNEON(就 arm cortex-a8 架构而言)真的可以并行工作吗?我怎样才能做到这一点?

Can ARM and NEON (speaking in terms of arm cortex-a8 architecture) actually work in parallel? How can I achieve this?

有人可以指点我或分享一些使用ARM-NEON互操作的示例实现(伪代码/算法/代码,而不是理论实现论文或演讲)吗?(使用内在函数或 inline-asm 实现都可以.)

Could someone point to me or share some sample implementations(pseudo-code/algorithms/code, not the theoretical implementation papers or talks) which uses the inter-operations of ARM-NEON together? (implementations either with intrinsics or inline-asm will do.)

推荐答案

答案取决于 ARM CPU.例如,Cortex-A8 使用协处理器来实现 NEON 和 VFP 指令,这些指令通过 FIFO 连接到 ARM 内核.当指令解码器检测到 NEON 或 VFP 指令时,它只需将其放入先进先出.NEON 协处理器从 FIFO 中获取指令并执行它们.因此,NEON/VFP 协处理器稍有滞后——在 Cortext-A8 上最多达 20 个周期左右.

The answer depends on the ARM CPU. The Cortex-A8, for example, uses a coprocessor to implement the NEON and VFP instructions, which is connected to the ARM core via a FIFO. When the instruction decoder detects a NEON or VFP instruction, it simply places it into the fifo. The NEON coprocessor fetches instructions from the FIFO and executes them. The NEON/VFP coprocessor thus lags behind a bit - on the Cortext-A8 up to 20 cycles or so.

通常,该延迟并不关心该延迟,除非您尝试将数据从 NEON/VFP 协处理器传输回主 ARM 内核.(无论是通过从 NEON/VPF 移动到 ARM 寄存器,还是通过使用最近由 NEON 指令写入的 ARM 指令读取内存,都没有关系).在这种情况下,主 ARM 内核会停止,直到 NEON 内核清空 FIFO,即最多 20 个周期左右.

Usually, that delay doesn't care about that delay, unless you attempt to transfer data back from the NEON/VFP coprocessor to the main ARM core. (It doesn't matter much whether you do that by moving from a NEON/VPF into an ARM register, or by reading memory using ARM instructions that has recently been written to by NEON instructions). In that case, the main ARM core is stalled until the NEON core has emptied the FIFO, i.e. up to 20 cycles or so.

ARM 内核通常可以比 NEON/VPF 协处理器更快地将 NEON/VPF 指令排入队列.您可以通过适当地交错您的指令来利用它使两个内核并行工作.例如,在两个或三个 NEON 指令的每个块之后插入一个 ARM 指令.或者,如果您还想利用 ARM 的双重发布功能,则可以使用两条 ARM 指令.你将不得不使用内联汇编来做到这一点 - 如果你使用内在函数,指令的确切调度取决于编译器,并且它是否具有适当地交错它们的智能是任何人的猜测.你的代码看起来像

The ARM core can usually enqueue NEON/VPF instructions faster than the NEON/VPF coprocessor can execute them. You can exploit that to have both cores work in parallel by suitable interleaving your instructions. E.g., insert one ARM instruction after every block of two or three NEON instructions. Or maybe two ARM instructions if you also want to exploit ARM's dual-issue capability. You will have to use inline assembly to do this - if you use intrinsics, the exact scheduling of the instructions is up to the compiler, and whether it has the smarts to interleave them suitably is anybody's guess. Your code will look something like

<neon instruction>
<neon instruction>
<neon instruction>
<arm instruction>
<arm instruction>
<neon instruction>
...

我手头没有代码示例,但如果您对 ARM 汇编有点熟悉,那么交错指令应该不是什么大挑战.完成后,一定要使用指令级分析器来检查事情是否真的按预期工作.您应该看到几乎没有时间花在 ARM 指令上.

I don't have a code sample at hand, but if you're somewhat familiar with ARM assembly, interleaving the instructions shouldn't be much of a challenge. After you're done, be sure to use an instruction-level profiler to check that things actually work as intended. You should see virtually no time spent on the ARM instructions.

请记住,其他 ARMv7 实现可能会实现完全不同的 NEON.例如,似乎 Cortex A-9 已将 NEON 移到更靠近 ARM 内核的位置,并且在从 NEON/VFP 返回到 ARM 的数据移动方面的性能要低得多.这是否会影响指令的并行调度我不知道,但这绝对是需要注意的.

Remember that other ARMv7 implementations might implement NEON completely different. It seems, for example, that the Cortex A-9 has moved NEON closer to the ARM core, and has a much lower penatly on data movements from NEON/VFP back to ARM. Whether or not this affects parallel scheduling of instructions I do not know, but it's definitely something to watch out for.

这篇关于ARM 和 NEON 可以并行工作吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆