与NEON内在的数据类型兼容性 [英] Data type compatibility with NEON intrinsics
问题描述
我对ARM的优化工作使用NEON内部函数,从C + + code。我了解和掌握大部分打字的问题,但我被困在这一个:
指令 vzip_u8
返回 uint8x8x2_t
值(其实两个数组 uint8x8_t
)。我想返回的值赋给一个普通的 uint16x8_t
。我没有看到相应的 vreinter preTQ
内在实现这一目标,和简单的强制类型转换将被拒绝。
一些定义来回答清楚...
NEON 一>有32个寄存器,64位宽(双视图为16个寄存器,128位宽)。
NEON单元可以为查看相同的寄存器组:
- 16个128位四字寄存器Q0-Q15
- 32 64位双字寄存器D0-D31。
块引用>
块引用>
uint16x8_t
是一种需要128位存储因而它需要在四字
注册。ARM NEON内在函数具有
矢量数组数据类型
中的ARM®C语言扩展:
...在加载和存储操作使用,
表查找操作,和的结果类型返回一对向量的操作。
块引用>
块引用>VZIP 指令
...交织两个向量的元素
VZIP DD,DM
块引用>
块引用>和有内在像
uint8x8x2_t vzip_u8(uint8x8_t,uint8x8_t)
从这些我们可以得出这样的结论uint8x8x2_t实际上是两个随机编号的双寄存器的列表,因为VZIP说明不会对输入寄存器为了任何要求。
现在的答案是...
uint8x8x2_t
可以包含非连续两个dualword寄存器,而uint16x8_t
是由两个连续的dualword寄存器的数据结构其中第一个具有偶数指数(D0-D31 - > Q0-Q15)。由于这个你不能用两个双字寄存器投
矢量数组数据类型
来四字寄存器...很容易。编译器可能会聪明地为您提供帮助,或者你可以强制转换,但是我会检查的正确性产生的装配以及性能。
I am working on ARM optimizations using the NEON intrinsics, from C++ code. I understand and master most of the typing issues, but I am stuck on this one:
The instruction
vzip_u8
returns auint8x8x2_t
value (in fact an array of twouint8x8_t
). I want to assign the returned value to a plainuint16x8_t
. I see no appropriatevreinterpretq
intrinsic to achieve that, and simple casts are rejected.解决方案Some definitions to answer clearly...
NEON has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide).
The NEON unit can view the same register bank as:
- sixteen 128-bit quadword registers, Q0-Q15
- thirty-two 64-bit doubleword registers, D0-D31.
uint16x8_t
is a type which requires 128-bit storage thus it needs to be in anquadword
register.ARM NEON Intrinsics has a definition called
vector array data type
in ARM® C Language Extensions:... for use in load and store operations, in table-lookup operations, and as the result type of operations that return a pair of vectors.
vzip instruction
... interleaves the elements of two vectors.
vzip Dd, Dm
and has an intrinsic like
uint8x8x2_t vzip_u8 (uint8x8_t, uint8x8_t)
from these we can conclude that uint8x8x2_t is actually a list of two random numbered doubleword registers, because vzip instructions doesn't have any requirement on order of input registers.
Now the answer is...
uint8x8x2_t
can contain non-consecutive two dualword registers whileuint16x8_t
is a data structure consisting of two consecutive dualword registers which first one has an even index (D0-D31 -> Q0-Q15).Because of this you can't cast
vector array data type
with two double word registers to a quadword register... easily.Compiler may be smart enough to assist you, or you can just force conversion however I would check the resulting assembly for correctness as well as performance.
这篇关于与NEON内在的数据类型兼容性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!