NEON 简单向量赋值内在? [英] NEON simple vector assignment intrinsic?
问题描述
将 uint32x4_t
类型的 r1
、r3
和 r4
加载到 NEON 寄存器中,我有以下代码:
Having r1
,r3
and r4
of type uint32x4_t
loaded into NEON registers I have the following code:
r3 = veorq_u32(r0,r3);
r4 = r1;
r1 = vandq_u32(r1,r3);
r4 = veorq_u32(r4,r2);
r1 = veorq_u32(r1,r0);
我只是想知道 GCC 是否真的将 r4 = r1
翻译成 vmov
指令.看着反汇编的代码,我并不感到惊讶.(而且我无法弄清楚生成的汇编代码实际上是做什么的)
And I was just wondering whether GCC actually translates r4 = r1
into the vmov
instruction. Looking at the disassembled code I wasn't surprised that it didn't. (moreover I can't figure out what the generated assembly code actually does)
浏览 ARM 的 NEON 内在函数参考我找不到任何简单的向量->向量赋值内在函数.
Skimming through ARM's NEON intrinsics reference I couldn't find any simple vector->vector assignment intrinsic.
实现这一目标的最简单方法是什么?我不确定内联汇编代码的样子,因为我不知道 vld1q_u32
分配的 r1
和 r4
在哪些寄存器中.我不需要实际的交换,只需要分配.
What's the easiest way to achieve this? I'm not sure how an inlined assembly code would look like since I don't know in which registers were r1
and r4
assigned by vld1q_u32
. I don't need an actual swap, just assignment.
推荐答案
C 有一个抽象机器的概念.分配和其他操作是根据这个抽象机器来描述的.赋值 r4 = r1;
表示要将 r1 的值分配给 r4在抽象机中.
C has a concept of an abstract machine. Assignments and other operations are described in terms of this abstract machine. The assignment r4 = r1;
says to assign r4 the value of r1 in the abstract machine.
当编译器为程序生成指令时,它通常不会完全模仿抽象机器中发生的所有事情.它将抽象机器中发生的操作转换为获得相同结果的处理器指令.编译器会跳过诸如移动指令之类的事情,如果它可以确定没有它们也可以获得相同的结果.
When the compiler generates instructions for a program, it generally does not exactly mimic everything that occurs in the abstract machine. It translates the operations that occur in the abstract machine into processor instructions that get the same results. The compiler will skip things like move instructions if it can figure out that it can get the same results without them.
特别是,编译器可能不会每次都将 r1
保持在同一个位置.它可能会在您第一次需要它时将其从内存加载到某个寄存器 R7 中.但是它可能会通过将结果放入 R8 中同时将 r1
的原始值保留在 R7 中来实现您的语句 r1 = vandq_u32(r1,r3);
.然后,当您稍后有 r4 = veorq_u32(r4,r2);
时,编译器可以使用 R7 中的值,因为它仍然包含 r4
将具有的值 (来自抽象机中的 r4 = r1;
语句).
In particular, the compiler might not keep r1
in the same place every time. It might load it from memory into some register R7 the first time you need it. But then it might implement your statement r1 = vandq_u32(r1,r3);
by putting the result in R8 while keeping the original value of r1
in R7. Then, when you later have r4 = veorq_u32(r4,r2);
, the compiler can use the value in R7, because it still contains that value that r4
would have (from the r4 = r1;
statement) in the abstract machine.
即使您明确编写了 vmov
内在函数,编译器也可能不会为它发出指令,只要它发出的指令最终得到相同的结果.
Even if you explicitly wrote a vmov
intrinsic, the compiler might not issue an instruction for it, as long as it issues instructions that get the same result in the end.
这篇关于NEON 简单向量赋值内在?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!