与 NEON 内在函数的数据类型兼容性 [英] Data type compatibility with NEON intrinsics

查看：24 发布时间：2022/1/17 14:03:43 gcc arm neon intrinsics

本文介绍了与 NEON 内在函数的数据类型兼容性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 C++ 代码中的 NEON 内在函数进行 ARM 优化.我理解并掌握了大部分打字问题，但我被困在这个问题上:

指令 vzip_u8 返回一个 uint8x8x2_t 值(实际上是两个 uint8x8_t 的数组).我想将返回的值分配给一个普通的 uint16x8_t.我没有看到合适的 vreinterpretq 内在实现这一点，并且简单的强制转换被拒绝.

解决方案

一些定义要明确回答...

NEON 有 32 个寄存器，64 位宽(双视图为 16 个寄存器，128 位宽).

<块引用><块引用>

NEON 单元可以查看相同的寄存器组:

16 个 128 位四字寄存器，Q0-Q15
32 个 64 位双字寄存器，D0-D31.

uint16x8_t 是一种需要 128 位存储的类型，因此它需要在 quadword 寄存器中.

ARM NEON Intrinsics 在 ARM® C 语言扩展:

<块引用><块引用>

... 用于加载和存储操作，在查表操作，作为返回一对向量的操作的结果类型.

vzip说明

<块引用><块引用>

...交错两个向量的元素.

vzip Dd, Dm

并且有一个 intrinsic 喜欢

uint8x8x2_t vzip_u8 (uint8x8_t, uint8x8_t)

从这些我们可以得出结论，uint8x8x2_t实际上是两个随机编号的双字寄存器的列表，因为vzip指令对输入寄存器的顺序没有任何要求.

现在答案是……

uint8x8x2_t 可以包含不连续的两个双字寄存器，而 uint16x8_t 是由两个连续的双字寄存器组成的数据结构，其中第一个具有偶数索引(D0-D31 -> Q0-Q15).

因此，您不能轻松地将具有两个双字寄存器的 向量数组数据类型 转换为四字寄存器....

编译器可能足够聪明，可以帮助您，或者您可以强制转换，但我会检查生成的程序集的正确性和性能.

I am working on ARM optimizations using the NEON intrinsics, from C++ code. I understand and master most of the typing issues, but I am stuck on this one:

The instruction vzip_u8 returns a uint8x8x2_t value (in fact an array of two uint8x8_t). I want to assign the returned value to a plain uint16x8_t. I see no appropriate vreinterpretq intrinsic to achieve that, and simple casts are rejected.

解决方案

Some definitions to answer clearly...

NEON has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide).

The NEON unit can view the same register bank as:

sixteen 128-bit quadword registers, Q0-Q15

thirty-two 64-bit doubleword registers, D0-D31.

uint16x8_t is a type which requires 128-bit storage thus it needs to be in an quadword register.

ARM NEON Intrinsics has a definition called vector array data type in ARM® C Language Extensions:

... for use in load and store operations, in table-lookup operations, and as the result type of operations that return a pair of vectors.

vzip instruction

... interleaves the elements of two vectors.

vzip Dd, Dm

and has an intrinsic like

uint8x8x2_t vzip_u8 (uint8x8_t, uint8x8_t)

from these we can conclude that uint8x8x2_t is actually a list of two random numbered doubleword registers, because vzip instructions doesn't have any requirement on order of input registers.

Now the answer is...

uint8x8x2_t can contain non-consecutive two dualword registers while uint16x8_t is a data structure consisting of two consecutive dualword registers which first one has an even index (D0-D31 -> Q0-Q15).

Because of this you can't cast vector array data type with two double word registers to a quadword register... easily.

Compiler may be smart enough to assist you, or you can just force conversion however I would check the resulting assembly for correctness as well as performance.

这篇关于与 NEON 内在函数的数据类型兼容性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与 NEON 内在函数的数据类型兼容性 [英] Data type compatibility with NEON intrinsics

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

与 NEON 内在函数的数据类型兼容性 [英] Data type compatibility with NEON intrinsics

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭