从基于源的索引转换为基于目标的索引 [英] Converting from Source-based Indices to Destination-based Indices
问题描述
我在一些C代码中使用AVX2指令.
I'm using AVX2 instructions in some C code.
VPERMD 指令采用两个8位整数向量a
和idx
并通过基于idx
置换a
来生成第三个dst
.这似乎等效于dst[i] = a[idx[i]] for i in 0..7
.我称此为基于源,因为移动是基于源进行索引的.
The VPERMD instruction takes two 8-integer vectors a
and idx
and generates a third one, dst
, by permuting a
based on idx
. This seems equivalent to dst[i] = a[idx[i]] for i in 0..7
. I'm calling this source based, because the move is indexed based on the source.
但是,我有基于目的地形式的计算索引.这对于设置数组是很自然的,等效于dst[idx[i]] = a[i] for i in 0..7
.
However, I have my calculated indices in destination based form. This is natural for setting an array, and is equivalent to dst[idx[i]] = a[i] for i in 0..7
.
如何从基于源的表单转换为基于目标的表单?一个示例测试用例是:
How can I convert from source-based form to destination-based form? An example test case is:
{2 1 0 5 3 4 6 7} source-based form.
{2 1 0 4 5 3 6 7} destination-based equivalent
对于此转换,我将保留在ymm寄存器中,这意味着基于目标的解决方案不起作用.即使我要分别插入每个对象,因为它只能在常量索引上运行,所以不能只设置它们.
For this conversion, I'm staying in ymm registers, so that means that destination-based solutions don't work. Even if I were to insert each separately, since it only operates on constant indexes, you can't just set them.
推荐答案
我想您是在隐式地说,您不能修改代码以首先计算基于源的索引?除了采用基于dst的索引的AVX512分散指令之外,我想不出对x86 SIMD可以做的任何事情.
I guess you're implicitly saying that you can't modify your code to calculate source-based indices in the first place? I can't think of anything you can do with x86 SIMD, other than AVX512 scatter instructions that take dst-based indices.
将向量存储,反转和重新加载实际上可能是最好的. (或者直接传送到整数寄存器,而不是通过内存,可能是在vextracti128/packusdw之后传送,因此您只需要两次从矢量到整数reg的64位传送:movq和pextrq.)
Storing to memory, inverting, and reloading a vector might actually be best. (Or transferring to integer registers directly, not through memory, maybe after a vextracti128 / packusdw so you only need two 64-bit transfers from vector to integer regs: movq and pextrq).
但是无论如何,然后将它们用作索引,以将计数器存储到内存中的数组中,并将其作为向量重新加载.这仍然是缓慢而丑陋的,并且包括存储转发失败延迟.因此,可能值得您花些时间更改索引生成代码以生成基于源的混洗向量.
But anyway, then use them as indices to store a counter into an array in memory, and reload that as a vector. This is still slow and ugly, and includes a store-forwarding failure delay. So it's probably worth your while to change your index-generating code to generate source-based shuffle vectors.
这篇关于从基于源的索引转换为基于目标的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!