力克隆“执行数学早”对常数值 [英] Force Clang to "perform math early" on constant values

查看:269
本文介绍了力克隆“执行数学早”对常数值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与如何通过内联函数强制const传播相关? Clang有一个集成的汇编器;并且它不使用系统的汇编器(通常是GNU AS(GAS))。非俚语早期执行了数学,一切都工作。



我说早,因为@ n.m。反对将其描述为由预处理器执行的数学。但是这个想法是在编译时已知的值,并且应该尽早进行评估,就像预处理器评估 #if(X%32 == 0)时。 / p>

下面,Clang 3.6抱怨违反约束。看来常数不会在整个传播:

  $ export CXX = / usr / local / bin / clang ++ 
$ $ CXX --version
clang version 3.6.0(tags / RELEASE_360 / final)
目标:x86_64-apple-darwin12.6.0
...
$ make
/ usr / local / bin / clang ++ -DNDEBUG -g2 -O3 -Wall -fPIC -arch i386 -arch x86_64 -pipe -Wno-tautological-compare -c integer.cpp
在从integer.cpp中包含的文件中:8:
在./integer.h:7中包含的文件中:
在./secblock.h:7中包含的文件中:
./misc.h:941:44:error :constraint'I'期望一个整数常量表达式
__asm__(rolb%1,%0:+ mq(x):I(unsigned char)(y%8)
^ ~~~~~~~~~~~~~~~~~~~~~~~~~
./misc.h:951:44:error:constraint'我期望一个整数常量表达式
...

上述函数内联模板特殊化:

 模板<> inline byte rotrFixed< byte>(byte x,unsigned int y)
{
// I约束确保我们使用
//移位量y的immediate-8变量。但是,y必须为[0,31]。 We
//依赖于预处理器来支持常量并执行
//模数减法,以便汇编器生成指令。
__asm__(rorb%1,%0:+ mq(x):I((unsigned char)(y%8)));
return x;
}

使用const值调用它们,因此旋转量已知编译时间。典型的调用者可能看起来像:

  unsigned int x1 = rotrFixed< byte>(1,4) 
unsigned int x2 = rotrFixed< byte>(1,32);

如果 GCC Clang 提供了内部资源,可执行在接近恒定时间旋转。我甚至认为执行旋转,因为他们甚至没有那个。



需要什么窍门让Clang恢复执行预处理const值?






聪明的读者会认出 rotrFixed< byte>(1,32) code>可能是未定义的行为,如果使用传统的C / C ++旋转。所以我们进入汇编,以避免C / C ++的限制,并享受1指令加速。



好奇的读者可能想知道为什么我们会这样做。密码学家调用规范,有时这些规范不同意底层硬件或标准体。






为此打开了一个错误问题: LLVM错误24226 - 常量未传播到内联汇编中,导致约束我期望一个整数常量表达式



我不知道Clang所做的保​​证,但我知道编译器和集成的汇编器声称与GCC兼容GNU的汇编器。而且GCC和GAS提供了常数值的传播。

解决方案

因为你似乎不幸运,由于设计决定, ror r / m8,cl 表单可能是一个很好的折衷:

  __ asm__(rorb%b1,%b0:+ q,m(x):c,c(y):cc 

多个替代约束语法是由于clang的问题促进内存使用的注册使用,请参阅此处。我不知道这个问题是否已在以后的版本中解决。 gcc在约束匹配和避免溢出方面往往更好。



这需要将(y) code> rcx / ecx / cl 寄存器,但编译器可能会隐藏另一个延迟。此外,(y)没有范围问题。 rorb 有效地使用(%cl%8)cc clobber不是必需的。






表达式常数,gcc和clang都可以使用 __ builtin_constant_p

  if(__builtin_constant_p y))
__asm __(rorb%1,%b0:+ q,m(x):N,N((unsigned char)y):cc
else
...非常量(y)...

或如邮件列表中所暗示的那样:

  if(__builtin_constant_p(y))
{
if ((y& = 0x7)!= 0)
x =(x> y)| (x <(8-y)); / * gcc生成旋转。 * /
}


This is related to How to force const propagation through an inline function? Clang has an integrated assembler; and it does not use the system's assembler (which is often GNU AS (GAS)). Non-Clang performed the math early, and everything "just worked".

I say "early" because @n.m. objected to describing it as "math performed by the preprocessor." But the idea is the value is known at compile time, and it should be evaluated early, like when the preprocessor evaluates a #if (X % 32 == 0).

Below, Clang 3.6 is complaining about violating a constraint. It appears the constant is not being propagated throughout:

$ export CXX=/usr/local/bin/clang++
$ $CXX --version
clang version 3.6.0 (tags/RELEASE_360/final)
Target: x86_64-apple-darwin12.6.0
...
$ make
/usr/local/bin/clang++ -DNDEBUG -g2 -O3 -Wall -fPIC -arch i386 -arch x86_64 -pipe -Wno-tautological-compare -c integer.cpp
In file included from integer.cpp:8:
In file included from ./integer.h:7:
In file included from ./secblock.h:7:
./misc.h:941:44: error: constraint 'I' expects an integer constant expression
        __asm__ ("rolb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
                                                  ^~~~~~~~~~~~~~~~~~~~
./misc.h:951:44: error: constraint 'I' expects an integer constant expression
...

The functions above are inlined template specializations:

template<> inline byte rotrFixed<byte>(byte x, unsigned int y)
{
    // The I constraint ensures we use the immediate-8 variant of the
    // shift amount y. However, y must be in [0, 31] inclusive. We
    // rely on the preprocessor to propoagte the constant and perform
    // the modular reduction so the assembler generates the instruction.
    __asm__ ("rorb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
    return x;
}

They are being invoked with a const value, so the rotate amount is known at compile time. A typical caller might look like:

unsigned int x1 =  rotrFixed<byte>(1, 4);
unsigned int x2 =  rotrFixed<byte>(1, 32);

None of these [questionable] tricks would be required if GCC or Clang provided an intrinsic to perform the rotate in near constant time. I'd even settle for "perform the rotate" since they don't even have that.

What is the trick needed to get Clang to resume performing the preprocessing of the const value?


Astute readers will recognize rotrFixed<byte>(1, 32) could be undefined behavior if using a traditional C/C++ rotate. So we drop into assembly to avoid the C/C++ limitations and enjoy the 1 instruction speedup.

Curious reader may wonder why we would do this. The cryptographers call out the specs, and sometimes those specs are not sympathetic to the underlying hardware or standard bodies. Rather than changing the cryptographer's specification, we attempt to provide it verbatim to make audits easier.


A bug is opened for this issue: LLVM Bug 24226 - Constant not propagated into inline assembly, results in "constraint 'I' expects an integer constant expression".

I don't know what guarantees Clang makes, but I know the compiler and integrated assembler claim to be compatible with GCC and GNU's assembler. And GCC and GAS provide the propagation of the constant value.

解决方案

Since you seem to be out of luck trying to force a constant evaluation due to design decisions, the ror r/m8, cl form might be a good compromise:

__asm__ ("rorb %b1, %b0" : "+q,m" (x) : "c,c" (y) : "cc");

The multiple alternative constraint syntax is to 'promote' register use over memory use due to an issue with clang, covered here. I don't know if this issue has been resolved in later versions. gcc tends to be better at constraint matching and avoiding spills.

This does require loading (y) into the rcx/ecx/cl register, but the compiler can probably hide it behind another latency. Furthermore, there are no range issues for (y). rorb effectively uses (%cl % 8). The "cc" clobber isn't required.


If an expression is constant, both gcc and clang can use __builtin_constant_p :

if (__builtin_constant_p(y))
    __asm__("rorb %1, %b0" : "+q,m" (x) : "N,N" ((unsigned char) y) : "cc");
else
    ... non-constant (y) ...

or as alluded to in the mailing list:

if (__builtin_constant_p(y))
{
    if ((y &= 0x7) != 0)
        x = (x >> y) | (x << (8 - y)); /* gcc generates rotate. */
}

这篇关于力克隆“执行数学早”对常数值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆