NEON 简单向量赋值内在? [英] NEON simple vector assignment intrinsic?

查看:53
本文介绍了NEON 简单向量赋值内在?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

uint32x4_t 类型的 r1r3r4 加载到 NEON 寄存器中,我有以下代码:

Having r1,r3 and r4 of type uint32x4_t loaded into NEON registers I have the following code:

r3 = veorq_u32(r0,r3);   
r4 = r1;    
r1 = vandq_u32(r1,r3);   
r4 = veorq_u32(r4,r2);   
r1 = veorq_u32(r1,r0);

我只是想知道 GCC 是否真的将 r4 = r1 翻译成 vmov 指令.看着反汇编的代码,我并不感到惊讶.(而且我无法弄清楚生成的汇编代码实际上是做什么的)

And I was just wondering whether GCC actually translates r4 = r1 into the vmov instruction. Looking at the disassembled code I wasn't surprised that it didn't. (moreover I can't figure out what the generated assembly code actually does)

浏览 ARM 的 NEON 内在函数参考我找不到任何简单的向量->向量赋值内在函数.

Skimming through ARM's NEON intrinsics reference I couldn't find any simple vector->vector assignment intrinsic.

实现这一目标的最简单方法是什么?我不确定内联汇编代码的样子,因为我不知道 vld1q_u32 分配的 r1r4 在哪些寄存器中.我不需要实际的交换,只需要分配.

What's the easiest way to achieve this? I'm not sure how an inlined assembly code would look like since I don't know in which registers were r1 and r4 assigned by vld1q_u32. I don't need an actual swap, just assignment.

推荐答案

C 有一个抽象机器的概念.分配和其他操作是根据这个抽象机器来描述的.赋值 r4 = r1; 表示要将 r1 的值分配给 r4在抽象机中.

C has a concept of an abstract machine. Assignments and other operations are described in terms of this abstract machine. The assignment r4 = r1; says to assign r4 the value of r1 in the abstract machine.

当编译器为程序生成指令时,它通常不会完全模仿抽象机器中发生的所有事情.它将抽象机器中发生的操作转换为获得相同结果的处理器指令.编译器会跳过诸如移动指令之类的事情,如果它可以确定没有它们也可以获得相同的结果.

When the compiler generates instructions for a program, it generally does not exactly mimic everything that occurs in the abstract machine. It translates the operations that occur in the abstract machine into processor instructions that get the same results. The compiler will skip things like move instructions if it can figure out that it can get the same results without them.

特别是,编译器可能不会每次都将 r1 保持在同一个位置.它可能会在您第一次需要它时将其从内存加载到某个寄存器 R7 中.但是它可能会通过将结果放入 R8 中同时将 r1 的原始值保留在 R7 中来实现您的语句 r1 = vandq_u32(r1,r3);.然后,当您稍后有 r4 = veorq_u32(r4,r2); 时,编译器可以使用 R7 中的值,因为它仍然包含 r4 将具有的值 (来自抽象机中的 r4 = r1; 语句).

In particular, the compiler might not keep r1 in the same place every time. It might load it from memory into some register R7 the first time you need it. But then it might implement your statement r1 = vandq_u32(r1,r3); by putting the result in R8 while keeping the original value of r1 in R7. Then, when you later have r4 = veorq_u32(r4,r2);, the compiler can use the value in R7, because it still contains that value that r4 would have (from the r4 = r1; statement) in the abstract machine.

即使您明确编写了 vmov 内在函数,编译器也可能不会为它发出指令,只要它发出的指令最终得到相同的结果.

Even if you explicitly wrote a vmov intrinsic, the compiler might not issue an instruction for it, as long as it issues instructions that get the same result in the end.

这篇关于NEON 简单向量赋值内在?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆