使用AVX2指令向左移128位数字 [英] left shift of 128 bit number using AVX2 instruction

查看:111
本文介绍了使用AVX2指令向左移128位数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在AVX2中向左旋转128位数字.由于没有这样做的直接方法,因此我尝试使用左移和右移来完成任务.

I am trying to do left rotation of a 128 bit number in AVX2. Since there is no direct method of doing this, I have tried using left shift and right shift to accomplish my task.

这是我的代码的摘要.

        l = 4;
        r = 4;
        targetrotate = _mm_set_epi64x (l, r);
        targetleftrotate = _mm_sllv_epi64 (target, targetrotate);

上面的code代码片段将目标向左旋转4.
当我使用示例输入测试上面的代码时,我可以看到结果没有正确旋转.

The above c ode snippet rotates target by 4 to the left.
When I tested the above code with a sample input, I could see the result is not rotated correctly.

这是示例输入和输出

          input: 01 23 45 67 89 ab cd ef   fe dc ba 98 76 54 32 10
obtained output: 10 30 52 74 96 b8 da fc   e0 cf ad 8b 69 47 25 03

但是,我期望的输出是

                 12 34 56 78 9a bc de f0   ed cb a9 87 65 43 21 00

我知道我做错了.我想知道我的预期输出是否正确,如果是,我想知道我在这里做错了什么.

I know that I am doing something wrong. I want to know whether my expected output is right and if so, I want to know what am I doing wrong here.

任何帮助将不胜感激,并在此先感谢.

Any kind of help would be greatly appreciated and thanks in advance.

推荐答案

我认为您在打印输入和输出方式方面有一个字节序问题.

I think you have an endian issue with how you're printing your input and output.

每个64位半部分中最左边的字节是实际输出中最不重要的字节,因此 0xfe<<4 变为 0xe0 ,而 f 移入一个更高的字节.

The left-most bytes within each 64-bit half are the least-significant bytes in your actual output, so 0xfe << 4 becomes 0xe0, with the f shifting into a higher byte.

请参见有关显示矢量寄存器的公约.

您的预期"输出与您首先打印高元素(存储时的最高地址)的值匹配.但这不是你在做什么;您将按升序分别打印每个字节.x86是Little-endian.这与我们在英语中使用的数字系统相冲突,在英语中,我们从左到右读取阿拉伯数字,在左侧是最高的位数值,实际上是人类的大端数字.有趣的事实:阿拉伯语从右到左阅读,因此对于他们来说,书面数字是人类小尾数".

Your "expected" output matches what you'd get if you were printing values high element first (highest address when stored). But that's not what you're doing; you're printing each byte separately in ascending memory order. x86 is little-endian. This conflicts with the numeral system we use in English, where we read Arabic numerals from left to right, highest place-value on the left, effectively human big-endian. Fun fact: The Arabic language reads from right to left so for them, written numbers are "human little-endian".

(并且在各个元素中,较高的元素位于较高的地址;打印较高的 elements 首先会进行全矢量移位,例如 _mm_bslli_si128 又称为 pslldq 表示在元素之间向左移动字节的方式.)

(And across elements, higher elements are at higher addresses; printing high elements first makes whole-vector shifts like _mm_bslli_si128 aka pslldq make sense in the way it shifts bytes left between elements.)

如果使用调试器,则可能在其中进行打印.如果您使用的是调试打印,请参见打印__m128i变量.

If you're using a debugger, you're probably printing within that. If you're using debug-prints, see print a __m128i variable.

顺便说一句,您可以使用 _mm_set1_epi64x(4)将相同的值放入向量的两个元素中,而不用使用单独的 l r 具有相同值的变量.

BTW, you can use _mm_set1_epi64x(4) to put the same value in both elements of a vector, instead of using separate l and r variables with the same value.

_mm_set 内在函数中,高位元素排在第一位置,与Intel的asm手册中的图相匹配,并且与向左"移动位/字节的语义相匹配.左边.(例如,请参阅英特尔图表,其中 pshufd, _mm_shuffle_epi32 的元素编号)

In _mm_set intrinsics, the high elements come first, matching the diagrams in Intel's asm manuals, and matching the semantic meaning of "left" shift moving bits/bytes to the left. (e.g. see Intel's diagrams an element-numbering for pshufd, _mm_shuffle_epi32)

顺便说一句,AVX512具有 vprolvq 旋转.但是,是的,要模拟旋转,您需要SIMD版本的(x<< n)|.x>>(64-n).请注意,x86 SIMD将移位计数饱和,与标量移位屏蔽的计数不同.所以 x>>64 将移出所有位.如果要支持63以上的轮换计数,则可能需要屏蔽.

BTW, AVX512 has vprolvq rotates. But yes, to emulate rotates you want a SIMD version of (x << n) | x >> (64-n). Note that x86 SIMD shifts saturate the shift count, unlike scalar shifts which mask the count. So x >> 64 will shift out all the bits. If you want to support rotate counts above 63, you probably need to mask.

(C ++ ,但您使用的是内部函数,因此您不必担心C移位计数UB,而不必担心实际的已知硬件行为.)

(Best practices for circular shift (rotate) operations in C++ but you're using intrinsics so you don't have to worry about C shift-count UB, just the actual known hardware behaviour.)

这篇关于使用AVX2指令向左移128位数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆