“自定义内在"x64 的函数而不是内联汇编可能吗? [英] "Custom intrinsic" function for x64 instead of inline assembly possible?

查看:19
本文介绍了“自定义内在"x64 的函数而不是内联汇编可能吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在试验为我的库创建高度优化的、可重用的函数.例如,我用以下方式编写函数是 2 的幂":

I am currently experimenting with the creation of highly-optimized, reusable functions for a library of mine. For instance, I write the function "is power of 2" the following way:

template<class IntType>  
inline bool is_power_of_two( const IntType x )
{
    return (x != 0) && ((x & (x - 1)) == 0);
}

这是一个作为内联 C++ 模板的可移植、低维护的实现.这段代码由VC++ 2008编译成以下带分支的代码:

This is a portable, low-maintenance implementation as an inline C++ template. This code is compiled by VC++ 2008 to the following code with branches:

is_power_of_two PROC
    test    rcx, rcx
    je  SHORT $LN3@is_power_o
    lea rax, QWORD PTR [rcx-1]
    test    rax, rcx
    jne SHORT $LN3@is_power_o
    mov al, 1
    ret 0
$LN3@is_power_o:
    xor al, al
    ret 0
is_power_of_two ENDP

我还从这里找到了实现:"The bit twiddler",这将x64 的汇编代码如下:

I found also the implementation from here: "The bit twiddler", which would be coded in assembly for x64 as follows:

is_power_of_two_fast PROC
    test rcx, rcx
    je  SHORT NotAPowerOfTwo
    lea rax, [rcx-1]
    and rax, rcx
    neg rax
    sbb rax, rax
    inc rax
    ret
NotAPowerOfTwo:
    xor rax, rax
    ret
is_power_of_two_fast ENDP

我在汇编模块(.asm 文件)中测试了与 C++ 分开编写的两个子例程,第二个子例程的运行速度提高了大约 20%!

I tested both subroutines written separately from C++ in an assembly module (.asm file), and the second one works about 20% faster!

然而函数调用的开销是相当大的:如果我将第二个汇编实现is_power_of_two_fast"与模板函数的内联版本进行比较,尽管有分支,后者更快!

Yet the overhead of the function call is considerable: if I compare the second assembly implementation "is_power_of_two_fast" to the inline'd-version of the template function, the latter is faster despite branches!

不幸的是,x64 的新约定指定不允许内联汇编.人们应该改用内在函数".

Unfortunately, the new conventions for x64 specify that no inline assembly is allowed. One should instead use "intrinsic functions".

现在的问题是:我可以将更快的版本is_power_of_two_fast"实现为自定义内部函数或类似的东西,以便可以内联使用吗?或者,是否有可能以某种方式强制编译器生成函数的低分支版本?

Now the question: can I implement the faster version "is_power_of_two_fast" as a custom intrinsic function or something similar, so that it can be used inline? Or alternatively, is it possible to somehow force the compiler to produce the low-branch version of the function?

推荐答案

甚至 VC 2005 也能够生成带有 sbb 指令的代码.

Even VC 2005 is capable of producing code with sbb instruction.

对于 C 代码

bool __declspec(noinline) IsPowOf2(unsigned int a)
{
    return (a>=1)&((a&(a-1))<1);
}

编译如下

00401000  lea         eax,[ecx-1] 
00401003  and         eax,ecx 
00401005  cmp         eax,1 
00401008  sbb         eax,eax 
0040100A  neg         eax  
0040100C  cmp         ecx,1 
0040100F  sbb         ecx,ecx 
00401011  add         ecx,1 
00401014  and         eax,ecx 
00401016  ret          

这篇关于“自定义内在"x64 的函数而不是内联汇编可能吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆