ARM和NEON可以并行工作? [英] ARM and NEON can work in parallel?

查看:1156
本文介绍了ARM和NEON可以并行工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是参照问题:
<一href=\"http://stackoverflow.com/questions/12066964/checksum-$c$c-implementation-for-neon-in-intrinsics\">Checksum $ C为Neon在内部函数 $ C执行

This is with reference to question: Checksum code implementation for Neon in Intrinsics

在打开的链接作为单独的个人问题中列出的子问题。多的问题没有被提出作为单个线程的一部分。

Opening the sub-questions listed in the link as separate individual questions. As multi questions aren't to be asked as a part of single thread.

反正来了一个问题:

能否 ARM NEON (在ARM Cortex-A8的架构来说的),实际上是在并行工作?我怎样才能做到这一点?

Can ARM and NEON (speaking in terms of arm cortex-a8 architecture) actually work in parallel? How can I achieve this?

可能有人点我或分享一些示例实现(伪code /算法/ code,而不是理论上的实施论文或谈判),使用ARM的NEON的互操作起来呢? (无论是与内部函数或内联汇编的实现就可以了。)

Could someone point to me or share some sample implementations(pseudo-code/algorithms/code, not the theoretical implementation papers or talks) which uses the inter-operations of ARM-NEON together? (implementations either with intrinsics or inline-asm will do.)

推荐答案

答案取决于ARM CPU上。 Cortex-A8的,例如,使用一个协处理器来实现NEON和VFP指令,连接到通过FIFO中的ARM内核。当指令去codeR检测NEON和VFP指令,它只是它放置到FIFO。 NEON向协处理器从FIFO中获取指令并执行它们。因此,NEON / VFP协处理器滞后有点落后 - 在Cortext-A8多达20次左右。

The answer depends on the ARM CPU. The Cortex-A8, for example, uses a coprocessor to implement the NEON and VFP instructions, which is connected to the ARM core via a FIFO. When the instruction decoder detects a NEON or VFP instruction, it simply places it into the fifo. The NEON coprocessor fetches instructions from the FIFO and executes them. The NEON/VFP coprocessor thus lags behind a bit - on the Cortext-A8 up to 20 cycles or so.

一般情况下,该延迟不关心这个延迟,除非你试图将数据传回从NEON / VFP协处理器传送到主ARM内核。 (它没有多大关系是否做到这一点从NEON / VPF移动到ARM寄存器,或使用最近被NEON指令写入ARM指令读取内存)。在这种情况下,主要的ARM内核被停止,直到NEON芯已清空FIFO中,即,最多20个循环左右。

Usually, that delay doesn't care about that delay, unless you attempt to transfer data back from the NEON/VFP coprocessor to the main ARM core. (It doesn't matter much whether you do that by moving from a NEON/VPF into an ARM register, or by reading memory using ARM instructions that has recently been written to by NEON instructions). In that case, the main ARM core is stalled until the NEON core has emptied the FIFO, i.e. up to 20 cycles or so.

ARM内核通常可以排队NEON / VPF指令比NEON / VPF协处理器可以更快地执行它们。你可以利用这一点有通过合适的交错你的指示两个内核并行工作。例如,两个或三个NEON指令的每个块后插入一个ARM指令。或者两个ARM指令,如果你也想利用ARM的双发射能力。你将不得不使用内联汇编来做到这一点 - 如果你使用内部函数,指令准确的调度要由编译器,以及它是否具有智慧交错他们适当每个人都在猜测。
您code看起来像

The ARM core can usually enqueue NEON/VPF instructions faster than the NEON/VPF coprocessor can execute them. You can exploit that to have both cores work in parallel by suitable interleaving your instructions. E.g., insert one ARM instruction after every block of two or three NEON instructions. Or maybe two ARM instructions if you also want to exploit ARM's dual-issue capability. You will have to use inline assembly to do this - if you use intrinsics, the exact scheduling of the instructions is up to the compiler, and whether it has the smarts to interleave them suitably is anybody's guess. Your code will look something like

<neon instruction>
<neon instruction>
<neon instruction>
<arm instruction>
<arm instruction>
<neon instruction>
...

我没有在手code样品,但如果你有点熟悉ARM汇编,交织的说明不应该是一个很大的挑战。大功告成后,请务必使用一个指令级探查,检查按预期的东西实际工作。您应该看到花的ARM指令几乎没有时间。

I don't have a code sample at hand, but if you're somewhat familiar with ARM assembly, interleaving the instructions shouldn't be much of a challenge. After you're done, be sure to use an instruction-level profiler to check that things actually work as intended. You should see virtually no time spent on the ARM instructions.

请记住,其他的ARMv7实现可能实现NEON完全不同。看来,例如,存在Cortex A-9已经移动NEON接近ARM内核,和具有低得多的penatly上的数据移动从NEON / VFP回ARM。这是否会影响我不知道的指令并行调度,但它绝对的东西注意。

Remember that other ARMv7 implementations might implement NEON completely different. It seems, for example, that the Cortex A-9 has moved NEON closer to the ARM core, and has a much lower penatly on data movements from NEON/VFP back to ARM. Whether or not this affects parallel scheduling of instructions I do not know, but it's definitely something to watch out for.

这篇关于ARM和NEON可以并行工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆