__popcnt()和_mm_popcnt_u32()有什么区别? [英] What's the difference between __popcnt() and _mm_popcnt_u32()?

查看:962
本文介绍了__popcnt()和_mm_popcnt_u32()有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MS Visual C ++在具有SSE4.2的CPU上支持两种popcnt指令:

  1. __popcnt()
  2. _mm_popcnt_u32()

我发现的唯一区别是__popcnt()的文档被标记为"Microsoft特定",而_mm_popcnt_u32() 解决方案

由于英特尔和AMD,这是同一机器指令的两个不同的固有名称.支持它,并且不同的内在函数在C或C ++中也没有区别.


__ popcnt *()内置函数用于AMD的高级位操作(ABM)指令.参见 http://blogs.amd.com/developer/2007/09/26/barcelona-processor-feature-advanced-bit-manipulation-abm/

_mm_popcnt_u *()内在函数用于Intel的实现,它本身不是SSE4.2的一部分,而是在同一时间实现的.请参见 http://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT

根据 https://www.chessprogramming.org/Population_Count ,这两种实现都是二进制兼容的,尽管它们的固有名称不同.

Intel的架构手册指出:

在应用程序尝试使用POPCNT指令之前,它必须检查以下内容: 处理器支持SSE4.2(如果CPUID.01H:ECX.SSE4_2 [bit 20] = 1)和POPCNT(如果 CPUID.01H:ECX.POPCNT [bit 23] = 1).

AMD的 AMD64体系结构程序员手册第3卷:通用和系统说明

由CPUID返回的ECX位23(POPCNT)表示对POPCNT指令的支持 功能0000_0001h.软件必须在每次程序或库初始化时检查一次CPUID位 在使用POPCNT指令之前,否则可能导致不一致的行为.

我看不出为什么popcnt要求存在SSE4.2的任何原因,所以我认为检查ECX的第23位足以确定popcnt的存在.


AMD的巴塞罗那,第一个拥有popcnt的AMD CPU,没有完全实现SSE4,因此,英特尔的体系结构手册可能会建议一种确定存在性的方法,该方法可以在Intel CPU上运行,甚至在合格的AMD CPU上也无法运行.

英特尔公司在其第二卷中针对popcnt 的当前文档指令集参考手册只说了#UD If CPUID.01H:ECX.POPCNT [Bit 23] = 0 ,因此导致某些软件在没有SSE4.2的某些AMD CPU上导致软件无法利用popcnt的反竞争建议已经消失了.

MS Visual C++ supports 2 flavors of the popcnt instruction on CPUs with SSE4.2:

  1. __popcnt()
  2. _mm_popcnt_u32()

The only difference I found was that the docs for __popcnt() are marked as "Microsoft Specific", and _mm_popcnt_u32() seems to be an intrinsic command name (non-MS-specific).

Is this the only difference, where the MS __popcnt() just calls the HW _mm_popcnt_u32()?

解决方案

These are two different intrinsic names for the same machine instruction, thanks to Intel and AMD. The instruction is the same on all CPUs that support it, and the different intrinsics also have no difference in C or C++.


The __popcnt*() builtins are for AMD's Advanced Bit Manipulation (ABM) instructions. See http://blogs.amd.com/developer/2007/09/26/barcelona-processor-feature-advanced-bit-manipulation-abm/

The _mm_popcnt_u*() intrinsics are for Intel's implementation, which aren't part of SSE4.2 per se, but were implemented around the same time. See http://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT

According to https://www.chessprogramming.org/Population_Count , both implementations are binary compatible, in spite of their different intrinsic names.

Intel's architecture manual states that:

Before an application attempts to use the POPCNT instruction, it must check that the processor supports SSE4.2 (if CPUID.01H:ECX.SSE4_2[bit 20] = 1) and POPCNT (if CPUID.01H:ECX.POPCNT[bit 23] = 1).

AMD's AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions says

Support for the POPCNT instruction is indicated by ECX bit 23 (POPCNT) as returned by CPUID function 0000_0001h. Software MUST check the CPUID bit once per program or library initialization before using the POPCNT instruction, or inconsistent behavior may result.

I can't see any reason why popcnt would require the presence of SSE4.2, so I think that checking bit 23 of ECX is sufficient to determine popcnt's presence.


AMD's Barcelona, the first AMD CPU to have popcnt, didn't fully implement SSE4, so it's possible that Intel's architecture manual suggests a method for determine presence which will work on Intel CPUs and fail on even qualified AMD CPUs.

Intel's current documentation for popcnt in their vol.2 instruction-set reference manual only says #UD If CPUID.01H:ECX.POPCNT [Bit 23] = 0 so the anti-competitive suggestion that would lead to software not taking advantage of popcnt on some AMD CPUs without SSE4.2 is gone.

这篇关于__popcnt()和_mm_popcnt_u32()有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆