为什么内联函数的效率比内置函数低? [英] Why does an inline function have lower efficiency than an in-built function?

查看:145
本文介绍了为什么内联函数的效率比内置函数低?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对 InterviewBit 中的数组提出问题.在这个问题中,我做了一个内联函数,返回整数的绝对值.但是有人告诉我,我的算法提交效率不高.但是当我改用C ++库中的abs()时,它给出了正确答案的结论.

I was trying a question on arrays in InterviewBit. In this question I made an inline function returning the absolute value of an integer. But I was told that my algorithm was not efficient on submitting it. But when I changed to using abs() from C++ library it gave a correct answer verdict.

这是我的函数,得到了低效判决-

Here is my function that got an inefficient verdict -

inline int abs(int x){return x>0 ? x : -x;}

int Solution::coverPoints(vector<int> &X, vector<int> &Y) {
    int l = X.size();
    int i = 0;
    int ans = 0;
    while (i<l-1){
        ans = ans + max(abs(X[i]-X[i+1]), abs(Y[i]-Y[i+1]));
        i++;
    }
    return ans;
}

这是得到正确答案的人-

Here's the one that got the correct answer -

int Solution::coverPoints(vector<int> &X, vector<int> &Y) {
    int l = X.size();
    int i = 0;
    int ans = 0;
    while (i<l-1){
        ans = ans + max(abs(X[i]-X[i+1]), abs(Y[i]-Y[i+1]));
        i++;
    }
    return ans;
}

为什么会发生这种情况,因为我认为内联函数最快,因为没有完成调用?还是网站有错误?如果站点正确,那么C ++ abs()使用什么比inline abs()更快?

Why did this happened, as I thought that inline functions are fastest as no calling is done? Or is the site having an error? And if the site is correct, what does C++ abs() use that is faster than inline abs()?

推荐答案

您的abs根据条件执行分支.虽然内置变体仅从整数中删除符号位,但很可能仅使用了两条指令即可.可能的汇编示例(摘自此处):

Your abs performs branching based on a condition. While the built-in variant just removes the sign bit from the integer, most likely using just a couple of instructions. Possible assembly example (taken from here):

cdq
xor eax, edx
sub eax, edx

cdq将寄存器eax的符号复制到寄存器edx.例如,如果它是一个正数,则edx将为零,否则,edx将为0xFFFFFF,表示-1.如果其为正数,则带有原始数的xor运算将保持不变(任何数字xor 0都将保持不变).但是,当eax为负时,eax xor 0xFFFFFF产生(不是eax).最后一步是从eax中减去edx.同样,如果eax为正,则edx为零,并且最终值仍然相同.对于负值,(〜eax)–(-1)= –eax是所需的值.

The cdq copies the sign of the register eax to register edx. For example, if it is a positive number, edx will be zero, otherwise, edx will be 0xFFFFFF which denotes -1. The xor operation with the origin number will change nothing if it is a positive number (any number xor 0 will not change). However, when eax is negative, eax xor 0xFFFFFF yields (not eax). The final step is to subtract edx from eax. Again, if eax is positive, edx is zero, and the final value is still the same. For negative values, (~ eax) – (-1) = –eax which is the value wanted.

如您所见,这种方法仅使用三个简单的算术指令,根本没有条件分支.

As you can see this approach uses only three simple arithmetic instructions and no conditional branching at all.

编辑:经过一些研究,结果发现许多内置的abs实现都使用相同的方法return __x >= 0 ? __x : -__x;,这种模式显然是编译器优化以避免不必要分支的目标.

Edit: After some research it turned out that many built-in implementations of abs use the same approach, return __x >= 0 ? __x : -__x;, and such a pattern is an obvious target for compiler optimization to avoid unnecessary branching.

但是,这不能证明使用自定义abs实现是合理的,因为它违反了 DRY 原则,没有人可以保证您的实现对于更复杂的场景和/或不寻常的平台同样有效.通常,只有在存在明确的性能问题或在现有实现中检测到某些其他缺陷时,才应该考虑重写某些库函数.

However, that does not justify the use of custom abs implementation as it violates the DRY principle and no one can guarantee that your implementation is going to be just as good for more sophisticated scenarios and/or unusual platforms. Typically one should think about rewriting some of the library functions only when there is a definite performance problem or some other defect detected in existing implementation.

Edit2 :仅从int切换为float会导致性能显着下降:

Edit2: Just switching from int to float shows considerable performance degradation:

float libfoo(float x)
{
    return ::std::fabs(x);
}

andps   xmm0, xmmword ptr [rip + .LCPI0_0]

和自定义版本:

inline float my_fabs(float x)
{
    return x>0.0f?x:-x;
}

float myfoo(float x)
{
    return my_fabs(x);
}

movaps  xmm1, xmmword ptr [rip + .LCPI1_0] # xmm1 = [-0.000000e+00,-0.000000e+00,-0.000000e+00,-0.000000e+00]
xorps   xmm1, xmm0
xorps   xmm2, xmm2
cmpltss xmm2, xmm0
andps   xmm0, xmm2
andnps  xmm2, xmm1
orps    xmm0, xmm2

在线编译器

这篇关于为什么内联函数的效率比内置函数低?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆