AVX512BW:使用bsf / tzcnt处理32位代码中的64位掩码吗? [英] AVX512BW: handle 64-bit mask in 32-bit code with bsf / tzcnt?

查看:248
本文介绍了AVX512BW:使用bsf / tzcnt处理32位代码中的64位掩码吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我在AVX512BW中 strlen功能的代码

this is my code for 'strlen' function in AVX512BW

vxorps          zmm0, zmm0, zmm0   ; ZMM0 = 0
vpcmpeqb        k0, zmm0, [ebx]    ; ebx is string and it's aligned at 64-byte boundary
kortestq        k0, k0             ; 0x00 found ?
jnz             .chk_0x00

现在用于'chk_0x00',在x86_64系统中没有问题我们可以这样处理:

now for 'chk_0x00', in x86_64 systems, there is no problem and we can handle it like this:

chk_0x00:
kmovq   rbx, k0
tzcnt   rbx, rbx
add     rax, rbx

这里有一个64位寄存器,因此我们可以存储屏蔽到其中,但是我的问题是关于x86系统,其中我们没有任何64位寄存器,因此我们必须使用内存保留区(8字节)并逐一检查两个掩码的DWORD(实际上,这是我想知道是否还有更好的方法)

here we have a 64-bit register so we can store the mask into it but my question is about x86 systems where we don't have any 64-bit register so we must using 'memory' reserve (8-byte) and check both DWORD of the mask one by one (in fact, this is my way and i want to know if there is any better way)

chk_0x00:
kmovd   ebx, k0       ; move the first dword of the mask to the ebx
test    ebx, ebx      ; 0x00 found in the first dword ?
jz      .check_next_dword
bsf     ebx, ebx
add     eax, ebx
jmp     .done
.check_next_dword:
      add     eax, 32     ; 0x00 is not found in the first DWORD of the mask so we pass it by adding 32 to the length
      sub     esp, 8      ; reserve 8-byte from memory
      kmovq   [esp], k0   ; move the 8-byte MASK from k0 to our reserved memory
      mov     ebx, [esp+4] ; move the second DWORD of the mask to the ebx
      bsf     ebx, ebx
      add     eax, ebx
      add     esp, 8

以我的x86方式,我用'kmovd'将掩码的第一个DWORD移到ebx中,但是我不知道第二个DWORD要做的事情面具 !!!所以我只是从内存中保留了8字节并将掩码(8字节)移入其中,然后将第二个dword移入ebx并再次检查了它……还有更好的解决方案吗? (我认为我的方法还不够快)
使用 vxorps 初始化 zmm 零注册?

in my x86 way, i used 'kmovd' to move the first DWORD of the mask into the ebx but i don't know what i have to do for the second DWORD of the mask !!! so i just reserved 8-byte from memory and move the mask (8-byte) into it then i moved the second dword into the ebx and checked it again ... is there any better solution ? (i think my way is not FAST enough) Also is it true to use vxorps to initializing a zmm register with zero ?

推荐答案

看起来像 KSHIFTRQ 可以用作右移<$ c的前32位的替代方法$ c> k0 计数器要低32位,可以将其复制到常规用途寄存器中。像这样:

Looks like KSHIFTRQ could be used as an alternative, to right-shift top 32-bits of k0 counter to be lower 32-bits, which could be copied to the regular purpose register. Like:

.check_next_dword:
      add     eax, 32     
      KSHIFTRQ k0, k0, 32  ;shift hi 32 bits to be low 32 bits
      kmovd   ebx, k0   
    ...

是的,根据 vxorps zmm0,zmm0,zmm0 会将 zmm0 设置为零。 https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf#page=1097 rel = nofollow noreferrer> vxorps Referense 无需异或掩盖为第3个参数(您也可以检查

And yes, vxorps zmm0, zmm0, zmm0 will set zmm0 to zero, as according to vxorps referense it's xor-ing without mask into 3-rd argument (you may check as well this SO question about zeroing zmm register)

这篇关于AVX512BW:使用bsf / tzcnt处理32位代码中的64位掩码吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆