为什么TZCNT可以用于我的Sandy Bridge处理器? [英] Why does TZCNT work for my Sandy Bridge processor?

查看:183
本文介绍了为什么TZCNT可以用于我的Sandy Bridge处理器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Core i7 3930k,它是Sandy Bridge微体系结构. 当执行以下代码(在MSVC19,VS2015下编译)时,结果使我感到惊讶(请参阅注释):

I'm running a Core i7 3930k, which is of the Sandy Bridge microarchitecture. When executing the following code (compiled under MSVC19, VS2015), the results surprised me (see in comments):

int wmain(int argc, wchar_t* argv[])
{
    uint64_t r = 0b1110'0000'0000'0000ULL;
    uint64_t tzcnt = _tzcnt_u64(r);
    cout << tzcnt << endl; // prints 13

    int info[4]{};
    __cpuidex(info, 7, 0);
    int ebx = info[1];
    cout << bitset<32>(ebx) << endl; // prints 32 zeros (including the bmi1 bit)

    return 0;
}

反汇编显示tzcnt指令确实是从内在函数发出的:

Disassembly shows that the tzcnt instruction is indeed emitted from the intrinsic:

    uint64_t r = 0b1110'0000'0000'0000ULL;
00007FF64B44877F 48 C7 45 08 00 E0 00 00 mov         qword ptr [r],0E000h  
    uint64_t tzcnt = _tzcnt_u64(r);
00007FF64B448787 F3 48 0F BC 45 08    tzcnt       rax,qword ptr [r]  
00007FF64B44878D 48 89 45 28          mov         qword ptr [tzcnt],rax  

为什么我没有收到#UD无效的操作码异常,指令正确运行,并且CPU报告它不支持上述指令?

How come I'm not getting an #UD invalid opcode exception, the instruction functions correctly, and the CPU reports that it does not support the aforementioned instruction?

这可能是一些奇怪的微代码修订版,其中包含该指令的实现,但未报告对此指令的支持(以及bmi1中包含的其他指令)吗?

Could this be some weird microcode revision that contains an implementation for the instruction but doesn't report support for it (and others included in bmi1)?

我没有检查其余的bmi1指令,但是我想知道这种现象有多普遍.

I haven't checked the rest of the bmi1 instructions, but I'm wondering how common a phenomenon this is.

推荐答案

Sandy Bridge(及更早版本)处理器似乎支持lzcnttzcnt的原因是两条指令都具有向后兼容的编码.

The reason that Sandy Bridge (and earlier) processors seem to support lzcnt and tzcnt is that both instructions have a backward compatible encoding.

lzcnt eax,eax  = rep bsr eax,eax
tzcnt eax,eax  = rep bsf eax,eax

在较旧的处理器上,rep前缀将被忽略.

On older processors the rep prefix is simply ignored.

这真是个好消息.
坏消息是两个版本的语义不同.

So much for the good news.
The bad news is that the semantics of both versions are different.

lzcnt eax,zero => eax = 32, CF=1, ZF=0  
bsr eax,zero   => eax = undefined, ZF=1
lzcnt eax,0xFFFFFFFF => eax=0, CF=0, ZF=1   //dest=number of msb leading zeros
bsr eax,0xFFFFFFFF => eax=31, ZF=0        //dest = bit index of highest set bit


tzcnt eax,zero => eax = 32, CF=1, ZF=0
bsf eax,zero   => eax = undefined, ZF=1
tzcnt eax,0xFFFFFFFF => eax=0, CF=0, ZF=1   //dest=number of lsb trailing zeros
bsf eax,0xFFFFFFFF => eax=0, ZF=0        //dest = bit index of lowest set bit

当源<> 0时,至少bsftzcnt生成相同的输出.bsrlzcnt对此不一致.
而且lzcnttzcnt的执行速度比bsr/bsf快得多.
bsftzcnt在标志用法上无法达成共识,这真是太糟糕了. 这种不必要的不​​一致意味着,除非可以确定其来源为非零,否则我不能使用tzcnt替代bsf.

At least bsf and tzcnt generate the same output when source <> 0. bsr and lzcnt do not agree on that.
Also lzcnt and tzcnt execute much faster than bsr/bsf.
It totally sucks that bsf and tzcnt cannot agree on the flag usage. This needless inconsistancy means that I cannot use tzcnt as a drop-in replacement for bsf unless I can be sure its source is non-zero.

这篇关于为什么TZCNT可以用于我的Sandy Bridge处理器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆