将_mm_shuffle_epi32转换为C表达式以进行排列? [英] Convert _mm_shuffle_epi32 to C expression for the permutation?
问题描述
我正在使用SSE2到NEON的端口.该端口尚处于早期阶段,并且会产生不正确的结果.错误结果的部分原因是_mm_shuffle_epi32
和我选择的NEON指令.
I'm working on a port of SSE2 to NEON. The port is early stage and it's producing incorrect results. Part of the reason for the incorrect results is _mm_shuffle_epi32
and the NEON instructions I selected.
_mm_shuffle_epi32
的文档来自英特尔文档更好,但我不清楚某些伪代码在做什么.
The documentation for _mm_shuffle_epi32
is on the lean side from Microsoft. The Intel documentation is better, but it's not clear to me what some of the pseudo-code is doing.
SELECT4(src, control)
{
CASE(control[1:0])
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
我需要设想_mm_shuffle_epi32
的功能.或更正确地说,排列是立即数应用于值的.我想我需要将其视为基本的C和AND与OR.
I need help envisioning what _mm_shuffle_epi32
does. Or more correctly, the permutation applied to the value by the immediate. I guess I need to see it as basic C and ANDs and ORs.
给出C语句和宏,例如:
Given C statements and macros like:
v2 = _mm_shuffle_epi32(v1, _MM_SHUFFLE(i1,i2,i3,i4));
将结果C表达式展开为基本C语句后会是什么样?
What does the resulting C expression look like when it's unrolled into basic C statements?
推荐答案
除非您需要解压缩包含四个2bit索引的8bit整数,否则不会进行AND/OR操作.
There's no AND/OR going on, unless you need to unpack the 8bit integer holding four 2bit indices.
为_MM_SHUFFLE
定义自己的定义,该定义扩展为四个arg,而不是打包它们.
Make your own definition for _MM_SHUFFLE
that expands to four args, instead of packing them.
就像
// dst = _mm_shuffle_epi32(src, _MM_SHUFFLE(d,c,b,a))
void pshufd(int dst[4], int src[4], int d,int c,int b,int a)
{ // note that the _MM_SHUFFLE args are high-element-first order
dst[0] = src[a];
dst[1] = src[b];
dst[2] = src[c];
dst[3] = src[d];
}
向量从低位元素= 0开始索引.低位元素是存储在内存中最低地址的元素,但是当值位于寄存器中时,应将其视为[ 3 2 1 0 ]
.在这种表示法中,矢量向右移(例如psrldq
)实际上向右移.
Vectors are indexed from low element = 0. The low element is the one that stores into memory at the lowest address, but when values are in registers you should think about them as [ 3 2 1 0 ]
. In this notation, vector right-shifts (like psrldq
) actually shift to the right.
这就是为什么_mm_set_epi32(3, 2, 1, 0)
以与int foo[] = { 0, 1, 2, 3 };
相反的顺序获取其args的原因.
This is why _mm_set_epi32(3, 2, 1, 0)
takes its args in reverse order from int foo[] = { 0, 1, 2, 3 };
.
这篇关于将_mm_shuffle_epi32转换为C表达式以进行排列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!