将 _mm_shuffle_epi32 转换为 C 表达式以进行排列? [英] Convert _mm_shuffle_epi32 to C expression for the permutation?

查看:18
本文介绍了将 _mm_shuffle_epi32 转换为 C 表达式以进行排列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将 SSE2 移植到 NEON.该端口处于早期阶段,并且产生不正确的结果.结果不正确的部分原因是 _mm_shuffle_epi32 和我选择的 NEON 指令.

I'm working on a port of SSE2 to NEON. The port is early stage and it's producing incorrect results. Part of the reason for the incorrect results is _mm_shuffle_epi32 and the NEON instructions I selected.

_mm_shuffle_epi32 的文档来自 微软.Intel 文档更好,但我不清楚一些伪代码在做什么.

The documentation for _mm_shuffle_epi32 is on the lean side from Microsoft. The Intel documentation is better, but it's not clear to me what some of the pseudo-code is doing.

SELECT4(src, control)
{
    CASE(control[1:0])
        0: tmp[31:0] := src[31:0]
        1: tmp[31:0] := src[63:32]
        2: tmp[31:0] := src[95:64]
        3: tmp[31:0] := src[127:96]
    ESAC
    RETURN tmp[31:0]
}

dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])

我需要帮助想象 _mm_shuffle_epi32 的作用.或者更准确地说,是立即数应用于值的排列.我想我需要将其视为基本的 C、AND 和 OR.

I need help envisioning what _mm_shuffle_epi32 does. Or more correctly, the permutation applied to the value by the immediate. I guess I need to see it as basic C and ANDs and ORs.

给定 C 语句和宏,例如:

Given C statements and macros like:

v2 = _mm_shuffle_epi32(v1, _MM_SHUFFLE(i1,i2,i3,i4));

当展开到基本的 C 语句中时,得到的 C 表达式是什么样的?

What does the resulting C expression look like when it's unrolled into basic C statements?

推荐答案

不会进行 AND/OR,除非您需要解包包含四个 2 位索引的 8 位整数.

There's no AND/OR going on, unless you need to unpack the 8bit integer holding four 2bit indices.

_MM_SHUFFLE 定义您自己的定义,将它们扩展为四个参数,而不是将它们打包.

Make your own definition for _MM_SHUFFLE that expands to four args, instead of packing them.

有点像

// dst = _mm_shuffle_epi32(src, _MM_SHUFFLE(d,c,b,a))
void pshufd(int dst[4], int src[4], int d,int c,int b,int a)
{   // note that the _MM_SHUFFLE args are high-element-first order
    dst[0] = src[a];
    dst[1] = src[b];
    dst[2] = src[c];
    dst[3] = src[d];
}

向量从低元素 = 0 开始索引.低元素是存储在最低地址的内存中的元素,但是当值在寄存器中时,您应该将它们视为 [ 3 2 1 0 ].在这种表示法中,向量右移(如 psrldq)实际上向右移动.

Vectors are indexed from low element = 0. The low element is the one that stores into memory at the lowest address, but when values are in registers you should think about them as [ 3 2 1 0 ]. In this notation, vector right-shifts (like psrldq) actually shift to the right.

这就是为什么 _mm_set_epi32(3, 2, 1, 0) 以与 int foo[] = { 0, 1, 2, 3 }; 相反的顺序获取其参数的原因;.

This is why _mm_set_epi32(3, 2, 1, 0) takes its args in reverse order from int foo[] = { 0, 1, 2, 3 };.

这篇关于将 _mm_shuffle_epi32 转换为 C 表达式以进行排列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆