SSE/AVX寄存器的非零字节索引 [英] The indices of non-zero bytes of an SSE/AVX register
问题描述
如果SSE/AVX寄存器的值使得其所有字节均为0或1,是否有任何方法可以有效地获取所有非零元素的索引?
If an SSE/AVX register's value is such that all its bytes are either 0 or 1, is there any way to efficiently get the indices of all non zero elements?
例如,如果xmm值为 | r0 = 0 | r1 = 1 | r2 = 0 | r3 = 1 | r4 = 0 | r5 = 1 | r6 = 0 | ... | r14 = 0 | r15 = 1 | 结果应该类似于(1、3、5,...,15).结果应放在另一个_m128i变量或char [16]数组中.
For example, if xmm value is | r0=0 | r1=1 | r2=0 | r3=1 | r4=0 | r5=1 | r6=0 |...| r14=0 | r15=1 | the result should be something like (1, 3, 5, ... , 15). The result should be placed in another _m128i variable or char[16] array.
如果有帮助,我们可以假设寄存器的值是所有字节均为0或某个恒定的非零值(不必要为1).
If it helps, we can assume that register's value is such that all bytes are either 0 or some constant nonzero value (not necessary 1).
我非常想知道是否有针对该指令的指令,或者最好是C/C ++内在指令.在任何SSE或AVX指令集中.
I am pretty much wondering if there is an instruction for that or preferably C/C++ intrinsic. In any SSE or AVX set of instructions.
它是正确的 @ zx485 观察到原始问题还不够清楚.我正在寻找任何连续"解决方案.
It was correctly observed by @zx485 that original question was not clear enough. I was looking for any "consecutive" solution.
上面的示例0 1 0 1 0 1 0 1...
应导致以下任一情况:
The example 0 1 0 1 0 1 0 1...
above should result in either of the following:
- 如果我们假设索引从1开始,那么
0
将是一个终止字节,结果可能是
- If we assume that indices start from 1, then
0
would be a termination byte and the result might be
002 004 006 008 010 012 014 016 00000000000000000000000000000
002 004 006 008 010 012 014 016 000 000 000 000 000 000 000 000
- 如果我们假设负字节是终止字节,则结果可能是
- 可以提供连续字节的任何内容,我们可以将其解释为原始值中非零元素的索引
001 003 005 007 009 011 013 015 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
001 003 005 007 009 011 013 015 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
实际上是 @harold 和
Indeed, as @harold and @Peter Cordes suggest in the comments to the original post, one of the possible solutions is to create a mask first (e.g. with pmovmskb
) and check non zero indices there. But that will lead to a loop.
推荐答案
如果您要对结果数组进行压缩",则您不清楚该方面的问题.我所说的压缩"是指结果应该是连续的.因此,例如对于0 1 0 1 0 1 0 1...
,有两种可能性:
Your question was unclear regarding the aspect if you want the result array to be "compressed". What I mean by "compressed" is, that the result should be consecutive. So, for example for 0 1 0 1 0 1 0 1...
, there are two possibilities:
非连续:
XMM0:000 001 000 003 000 005 000 007 000 009 000 011 000 013 000 015
XMM0: 000 001 000 003 000 005 000 007 000 009 000 011 000 013 000 015
连续:
XMM0:001 003 005 007 009 011 013 015 0000000000 000000000000000000
XMM0: 001 003 005 007 009 011 013 015 000 000 000 000 000 000 000 000
连续方法的一个问题是:如何确定它是索引0
还是终止值?
One problem of the consecutive approach is: how do you decide if it's index 0
or a termination value?
我正在为第一种非连续方法提供一种简单的解决方案,这种方法应该很快:
I'm offering a simple solution to the first, non-consecutive approach, which should be quite fast:
.data
ddqZeroToFifteen db 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
ddqTestValue: db 0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1
.code
movdqa xmm0, xmmword ptr [ddqTestValue]
pxor xmm1, xmm1 ; zero XMM1
pcmpeqb xmm0, xmm1 ; set to -1 for all matching
pandn xmm0, xmmword ptr [ddqZeroToFifteen] ; invert and apply indices
仅出于完整性考虑:第二个连续的方法未包含在此答案中.
Just for the sake of completeness: the second, the consecutive approach, is not covered in this answer.
这篇关于SSE/AVX寄存器的非零字节索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!