AVX512BW:使用 bsf/tzcnt 处理 32 位代码中的 64 位掩码? [英] AVX512BW: handle 64-bit mask in 32-bit code with bsf / tzcnt?
问题描述
这是我在 AVX512BW 中的strlen"函数代码
this is my code for 'strlen' function in AVX512BW
vxorps zmm0, zmm0, zmm0 ; ZMM0 = 0
vpcmpeqb k0, zmm0, [ebx] ; ebx is string and it's aligned at 64-byte boundary
kortestq k0, k0 ; 0x00 found ?
jnz .chk_0x00
现在对于 'chk_0x00',在 x86_64 系统中,没有问题,我们可以这样处理:
now for 'chk_0x00', in x86_64 systems, there is no problem and we can handle it like this:
chk_0x00:
kmovq rbx, k0
tzcnt rbx, rbx
add rax, rbx
这里我们有一个 64 位寄存器,因此我们可以将掩码存储到其中,但我的问题是关于 x86 系统,我们没有任何 64 位寄存器,因此我们必须使用内存"保留(8 字节)并一一检查掩码的两个DWORD(其实这是我的方法,我想知道是否有更好的方法)
here we have a 64-bit register so we can store the mask into it but my question is about x86 systems where we don't have any 64-bit register so we must using 'memory' reserve (8-byte) and check both DWORD of the mask one by one (in fact, this is my way and i want to know if there is any better way)
chk_0x00:
kmovd ebx, k0 ; move the first dword of the mask to the ebx
test ebx, ebx ; 0x00 found in the first dword ?
jz .check_next_dword
bsf ebx, ebx
add eax, ebx
jmp .done
.check_next_dword:
add eax, 32 ; 0x00 is not found in the first DWORD of the mask so we pass it by adding 32 to the length
sub esp, 8 ; reserve 8-byte from memory
kmovq [esp], k0 ; move the 8-byte MASK from k0 to our reserved memory
mov ebx, [esp+4] ; move the second DWORD of the mask to the ebx
bsf ebx, ebx
add eax, ebx
add esp, 8
在我的 x86 方式中,我使用kmovd"将掩码的第一个 DWORD 移动到 ebx 中,但我不知道我必须为掩码的第二个 DWORD 做什么!!!所以我只是从内存中保留了 8 字节并将掩码(8 字节)移动到其中,然后我将第二个双字移动到 ebx 中并再次检查......有没有更好的解决方案?(我觉得我的方式不够快)是否也可以使用 vxorps
来初始化 zmm
寄存器为零?
in my x86 way, i used 'kmovd' to move the first DWORD of the mask into the ebx but i don't know what i have to do for the second DWORD of the mask !!! so i just reserved 8-byte from memory and move the mask (8-byte) into it then i moved the second dword into the ebx and checked it again ... is there any better solution ? (i think my way is not FAST enough)
Also is it true to use vxorps
to initializing a zmm
register with zero ?
推荐答案
看起来像 KSHIFTRQ 可以用作替代方法,将 k0
计数器的前 32 位右移为较低的 32 位,可以复制到常规用途寄存器.喜欢:
Looks like KSHIFTRQ could be used as an alternative, to right-shift top 32-bits of k0
counter to be lower 32-bits, which could be copied to the regular purpose register. Like:
.check_next_dword:
add eax, 32
KSHIFTRQ k0, k0, 32 ;shift hi 32 bits to be low 32 bits
kmovd ebx, k0
...
是的,vxorps zmm0, zmm0, zmm0
会将 zmm0
设置为零,根据 vxorps 参考 它的异或没有掩码到第三个参数中(你可以检查还有这个 SO 问题 关于将 zmm 寄存器清零)
And yes, vxorps zmm0, zmm0, zmm0
will set zmm0
to zero, as according to vxorps referense it's xor-ing without mask into 3-rd argument (you may check as well this SO question about zeroing zmm register)
这篇关于AVX512BW:使用 bsf/tzcnt 处理 32 位代码中的 64 位掩码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!