_mm_crc32_u64定义不明确 [英] _mm_crc32_u64 poorly defined

查看:640
本文介绍了_mm_crc32_u64定义不明确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么在世界上被定义 _mm_crc32_u64(...)这样?

 无符号的Int64 _mm_crc32_u64(无符号__int64 CRC,无符号__int64 V);

在CRC32指令的总是的积累了32位CRC,的从不的64位CRC(这毕竟,CRC32不CRC64)。如果机器指令CRC32的发生的有64位目标操作数的高32位被忽略,并充满0的上完成,所以没有使用到曾经有一个64位的目的地。我明白了为什么英特尔允许的指令(均匀性)一个64位的目的操作数,但如果我想快速处理数据,我希望有一个源操作数尽可能大(即64位,如果我有剩多少数据,对于尾端),并始终在32位目标操作数较小。但是内在不允许一个64位源和32位目的地。注意其他内部函数:

  unsigned int类型_mm_crc32_u8(unsigned int类型的CRC,无符号的字符V);

类型的CRC是不是8位的类型,也不是返回类型,它们是32位。为什么没有

  unsigned int类型_mm_crc32_u64(unsigned int类型的CRC,无符号__int64 V);

?英特尔指令支持这一点,的的是内在,使最有意义的。

有没有人有便携式code(Visual Studio和GCC)实施后固有的?谢谢你。
我的猜测是这样的:

 的#define CRC32(D32,S)__asm​​ __(CRC32%0%1:+ XRM(D32):> XRM(S))

有关GCC,以及

 的#define CRC32(D32,S)__asm​​ {D32 CRC32,S}

有关的VisualStudio。不幸的是我的约束是如何起作用一点理解,并与语法和汇编级编程的语义一点经验。

小编辑:请注意,我定义的宏:

 的#define GET_INT64(P)*(reinter pret_cast<常量UINT64 *放大器;>(P))+
的#define GET_INT32(P)*(reinter pret_cast&下;常量UINT32 *放大器;&GT(P))+ +
的#define GET_INT16(P)*(reinter pret_cast&下;常量UINT16 *放大器;&GT(P))+ +
的#define GET_INT8(P)*(reinter pret_cast&下;常量UINT8 *放大器;&GT(P))+ +
的#define DO1_HW(CR,P)CR = _mm_crc32_u8(CR,GET_INT8(P))
的#define DO2_HW(CR,P)CR = _mm_crc32_u16(CR,GET_INT16(P))
的#define DO4_HW(CR,P)CR = _mm_crc32_u32(CR,GET_INT32(P))
的#define DO8_HW(CR,P)CR =(_mm_crc32_u64((UINT64)CR,GET_INT64(P)))及0xFFFFFFFF的;

请注意最后一个宏语句多么不同。缺乏统一性和肯定迹象表明内在尚未明智界定。虽然这不是必要把在显式(UINT64)投过去宏时,它是隐含的,确实会发生。拆卸产生code显示code两个蒙上32-> 64和64> 32,这两者都是不必要的。

换句话说,它是 _mm_crc32_u64 _mm_crc64_u64 ,但他们已经实现了它,仿佛它是后者。

如果我能得到 CRC32 上述正确的定义,那么我想改变我的宏

 的#define DO1_HW(CR,P)CR = CRC32(CR,GET_INT8(P))
的#define DO2_HW(CR,P)CR = CRC32(CR,GET_INT16(P))
的#define DO4_HW(CR,P)CR = CRC32(CR,GET_INT32(P))
的#define DO8_HW(CR,P)CR = CRC32(CR,GET_INT64(P))


解决方案

  

有没有人有便携式code(Visual Studio和GCC)实施后固有的?谢谢。


我的朋友和我写了一个C ++上证所内部函数的包装,其中包含pferred的CRC32指令的用法与64位SRC更$ P $。

HTTP://$c$c.google.com/p/sse -intrinsics /

查看i_crc32()指令。
(遗憾的是还有更瑕疵与其他指令英特尔的SSE内在规格,看的对有缺陷的内在设计的更多例子)本页面

Why in the world was _mm_crc32_u64(...) defined like this?

unsigned int64 _mm_crc32_u64( unsigned __int64 crc, unsigned __int64 v );

The "crc32" instruction always accumulates a 32-bit CRC, never a 64-bit CRC (It is, after all, CRC32 not CRC64). If the machine instruction CRC32 happens to have a 64-bit destination operand, the upper 32 bits are ignored, and filled with 0's on completion, so there is NO use to EVER have a 64-bit destination. I understand why Intel allowed a 64-bit destination operand on the instruction (for uniformity), but if I want to process data quickly, I want a source operand as large as possible (i.e. 64-bits if I have that much data left, smaller for the tail ends) and always a 32-bit destination operand. But the intrinsics don't allow a 64-bit source and 32-bit destination. Note the other intrinsics:

unsigned int _mm_crc32_u8 ( unsigned int crc, unsigned char v ); 

The type of "crc" is not an 8-bit type, nor is the return type, they are 32-bits. Why is there no

unsigned int _mm_crc32_u64 ( unsigned int crc, unsigned __int64 v );

? The Intel instruction supports this, and that is the intrinsic that makes the most sense.

Does anyone have portable code (Visual Studio and GCC) to implement the latter intrinsic? Thanks. My guess is something like this:

#define CRC32(D32,S) __asm__("crc32 %0, %1" : "+xrm" (D32) : ">xrm" (S))

for GCC, and

#define CRC32(D32,S) __asm { crc32 D32, S }

for VisualStudio. Unfortunately I have little understanding of how constraints work, and little experience with the syntax and semantics of assembly level programming.

Small edit: note the macros I've defined:

#define GET_INT64(P) *(reinterpret_cast<const uint64* &>(P))++
#define GET_INT32(P) *(reinterpret_cast<const uint32* &>(P))++
#define GET_INT16(P) *(reinterpret_cast<const uint16* &>(P))++
#define GET_INT8(P)  *(reinterpret_cast<const uint8 * &>(P))++


#define DO1_HW(CR,P) CR =  _mm_crc32_u8 (CR, GET_INT8 (P))
#define DO2_HW(CR,P) CR =  _mm_crc32_u16(CR, GET_INT16(P))
#define DO4_HW(CR,P) CR =  _mm_crc32_u32(CR, GET_INT32(P))
#define DO8_HW(CR,P) CR = (_mm_crc32_u64((uint64)CR, GET_INT64(P))) & 0xFFFFFFFF;

Notice how different the last macro statement is. The lack of uniformity is certainly and indication that the intrinsic has not been defined sensibly. While it is not necessary to put in the explicit (uint64) cast in the last macro, it is implicit and does happen. Disassembling the generated code shows code for both casts 32->64 and 64->32, both of which are unnecessary.

Put another way, it's _mm_crc32_u64, not _mm_crc64_u64, but they've implemented it as if it were the latter.

If I could get the definition of CRC32 above correct, then I would want to change my macros to

#define DO1_HW(CR,P) CR = CRC32(CR, GET_INT8 (P))
#define DO2_HW(CR,P) CR = CRC32(CR, GET_INT16(P))
#define DO4_HW(CR,P) CR = CRC32(CR, GET_INT32(P))
#define DO8_HW(CR,P) CR = CRC32(CR, GET_INT64(P))

解决方案

Does anyone have portable code (Visual Studio and GCC) to implement the latter intrinsic? Thanks.

My friend and I wrote a c++ sse intrinsics wrapper which contains the more preferred usage of the crc32 instruction with 64bit src.

http://code.google.com/p/sse-intrinsics/

See the i_crc32() instruction. (sadly there are even more flaws with intel's sse intrinsic specifications on other instructions, see this page for more examples of flawed intrinsic design)

这篇关于_mm_crc32_u64定义不明确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆