使用AVX2指令向左移128位数字 [英] left shift of 128 bit number using AVX2 instruction
问题描述
我正在尝试在AVX2中向左旋转128位数字.由于没有这样做的直接方法,因此我尝试使用左移和右移来完成任务.
I am trying to do left rotation of a 128 bit number in AVX2. Since there is no direct method of doing this, I have tried using left shift and right shift to accomplish my task.
这是我的代码的摘要.
l = 4;
r = 4;
targetrotate = _mm_set_epi64x (l, r);
targetleftrotate = _mm_sllv_epi64 (target, targetrotate);
上面的code代码片段将目标向左旋转4.
当我使用示例输入测试上面的代码时,我可以看到结果没有正确旋转.
The above c ode snippet rotates target by 4 to the left.
When I tested the above code with a sample input, I could see the result is not rotated correctly.
这是示例输入和输出
input: 01 23 45 67 89 ab cd ef fe dc ba 98 76 54 32 10
obtained output: 10 30 52 74 96 b8 da fc e0 cf ad 8b 69 47 25 03
但是,我期望的输出是
12 34 56 78 9a bc de f0 ed cb a9 87 65 43 21 00
我知道我做错了.我想知道我的预期输出是否正确,如果是,我想知道我在这里做错了什么.
I know that I am doing something wrong. I want to know whether my expected output is right and if so, I want to know what am I doing wrong here.
任何帮助将不胜感激,并在此先感谢.
Any kind of help would be greatly appreciated and thanks in advance.
推荐答案
我认为您在打印输入和输出方式方面有一个字节序问题.
I think you have an endian issue with how you're printing your input and output.
每个64位半部分中最左边的字节是实际输出中最不重要的字节,因此 0xfe<<4
变为 0xe0
,而 f
移入一个更高的字节.
The left-most bytes within each 64-bit half are the least-significant bytes in your actual output, so 0xfe << 4
becomes 0xe0
, with the f
shifting into a higher byte.
请参见有关显示矢量寄存器的公约.
您的预期"输出与您首先打印高元素(存储时的最高地址)的值匹配.但这不是你在做什么;您将按升序分别打印每个字节.x86是Little-endian.这与我们在英语中使用的数字系统相冲突,在英语中,我们从左到右读取阿拉伯数字,在左侧是最高的位数值,实际上是人类的大端数字.有趣的事实:阿拉伯语从右到左阅读,因此对于他们来说,书面数字是人类小尾数".
Your "expected" output matches what you'd get if you were printing values high element first (highest address when stored). But that's not what you're doing; you're printing each byte separately in ascending memory order. x86 is little-endian. This conflicts with the numeral system we use in English, where we read Arabic numerals from left to right, highest place-value on the left, effectively human big-endian. Fun fact: The Arabic language reads from right to left so for them, written numbers are "human little-endian".
(并且在各个元素中,较高的元素位于较高的地址;打印较高的 elements 首先会进行全矢量移位,例如 _mm_bslli_si128
又称为 pslldq
表示在元素之间向左移动字节的方式.)
(And across elements, higher elements are at higher addresses; printing high elements first makes whole-vector shifts like _mm_bslli_si128
aka pslldq
make sense in the way it shifts bytes left between elements.)
如果使用调试器,则可能在其中进行打印.如果您使用的是调试打印,请参见打印__m128i变量.
If you're using a debugger, you're probably printing within that. If you're using debug-prints, see print a __m128i variable.
顺便说一句,您可以使用 _mm_set1_epi64x(4)
将相同的值放入向量的两个元素中,而不用使用单独的 l
和 r
具有相同值的变量.
BTW, you can use _mm_set1_epi64x(4)
to put the same value in both elements of a vector, instead of using separate l
and r
variables with the same value.
在 _mm_set
内在函数中,高位元素排在第一位置,与Intel的asm手册中的图相匹配,并且与向左"移动位/字节的语义相匹配.左边.(例如,请参阅英特尔图表,其中 pshufd, _mm_shuffle_epi32
的元素编号)
In _mm_set
intrinsics, the high elements come first, matching the diagrams in Intel's asm manuals, and matching the semantic meaning of "left" shift moving bits/bytes to the left. (e.g. see Intel's diagrams an element-numbering for pshufd, _mm_shuffle_epi32
)
顺便说一句,AVX512具有 vprolvq
旋转.但是,是的,要模拟旋转,您需要SIMD版本的(x<< n)|.x>>(64-n)
.请注意,x86 SIMD将移位计数饱和,与标量移位屏蔽的计数不同.所以 x>>64
将移出所有位.如果要支持63以上的轮换计数,则可能需要屏蔽.
BTW, AVX512 has vprolvq
rotates. But yes, to emulate rotates you want a SIMD version of (x << n) | x >> (64-n)
. Note that x86 SIMD shifts saturate the shift count, unlike scalar shifts which mask the count. So x >> 64
will shift out all the bits. If you want to support rotate counts above 63, you probably need to mask.
(C ++ ,但您使用的是内部函数,因此您不必担心C移位计数UB,而不必担心实际的已知硬件行为.)
(Best practices for circular shift (rotate) operations in C++ but you're using intrinsics so you don't have to worry about C shift-count UB, just the actual known hardware behaviour.)
这篇关于使用AVX2指令向左移128位数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!