如果是汇编比C快? [英] When is assembler faster than C?

查看:211
本文介绍了如果是汇编比C快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

其中的一个说明理由知道汇编是,有时,它可以用来写code,它会比写一个code在一个更高层次的语言,C特别更好的性能。不过,我还听说它多次表示,尽管这不是完全错误的,在哪里可以实际使用汇编器的情况下,产生更好的性能code都是极其罕见和需要的专业知识和汇编的经验。

One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level language, C in particular. However, I've also heard it stated many times that although that's not entirely false, the cases where assembler can actually be used to generate more performant code are both extremely rare and require expert knowledge of and experience with assembler.

这个问题甚至没有进入一个事实,即汇编指令将是专用机床和非便携式或任何汇编程序的其他方面。有很多很好的理由知道汇编除此之外之一,当然,但这意味着是一个具体问题征求实例和数据,而不是汇编与高级语言扩展的话语。

This question doesn't even get into the fact that assembler instructions will be machine-specific and non-portable, or any of the other aspects of assembler. There are plenty of good reasons for knowing assembler besides this one, of course, but this is meant to be a specific question soliciting examples and data, not an extended discourse on assembler versus higher-level languages.

任何人都可以提供的情况下部分的具体的例子其中汇编程序会更快使用现代化的编译器比写得很好的C code,并且可以支持分析的证据表明,索赔?我是pretty相信这些情况存在,但我真的想知道这些情况究竟如何深奥的,因为它似乎是有些争论的焦点。

Can anyone provide some specific examples of cases where assembler will be faster than well-written C code using a modern compiler, and can you support that claim with profiling evidence? I am pretty confident these cases exist, but I really want to know exactly how esoteric these cases are, since it seems to be a point of some contention.

推荐答案

下面是一个真实的例子:定点乘法。

Here is a real world example: Fixed point multiplies.

这不仅对设备来方便没有浮点的,发亮,当涉及到precision因为他们给你precision的32位有predictable错误(浮动只有23位它是很难predict precision亏损)

These don't only come handy on devices without floating point, they shine when it comes to precision as they give you 32 bits of precision with a predictable error (float only has 23 bit and it's harder to predict precision loss)

写在32位体系结构的定点乘法的一种方法是这样的:

One way to write a fixed point multiply on a 32 bit architecture looks like this:

int inline FixedPointMul (int a, int b)
{
  long long a_long = a; // cast to 64 bit.

  long long product = a_long * b; // perform multiplication

  return (int) (product >> 16);  // shift by the fixed point bias
}

本code的问题是,我们做的东西,不能在C语言pssed直接EX $ P $。我们要乘两个32位数字,并得到一个64位的结果,而我们回到中间的32位。然而,在C这个乘法不存在。所有你能做的就是推动整数到64位,做一个64 * 64 = 64乘法。

The problem with this code is that we do something that can't be directly expressed in the C-language. We want to multiply two 32 bit numbers and get a 64 bit result of which we return the middle 32 bit. However, in C this multiply does not exist. All you can do is to promote the integers to 64 bit and do a 64*64 = 64 multiply.

在x86(ARM,MIPS等)可以做但乘法单指令。编译器仍然很多无视这一事实,并产生code调用运行时库函数来完成乘法。 16的转变也常常通过库例程(也是86可以做这种转变)。

The x86 (ARM, MIPS and others) can however do the multiply in a single instruction. Lots of compilers still ignore this fact and generate code that calls a runtime library function to do the multiply. The shift by 16 is also often done by a library routine (also the x86 can do such shifts).

因此​​,我们留下了一个或两个库调用只是为了繁衍。这有严重的后果。不仅是转变较慢,寄存器必须跨越函数调用pserved $ P $,它不利于内联和code-展开要么。

So we're left with one or two library calls just for a multiply. This has serious consequences. Not only is the shift slower, registers must be preserved across the function calls and it does not help inlining and code-unrolling either.

如果你重写同一code汇编你可以得到一个显著的速度提升。

If you rewrite the same code in assembler you can gain a significant speed boost.

在除了这一点:使用ASM不是解决该问题的最佳方法。大多数编译器允许您使用固有形式的一些汇编指令,如果你不能EX preSS他们℃的VS.NET2008编译器,例如暴露32 * 32 = 64位MUL作为__emul和64位的转变,因为__ll_rshift。

In addition to this: using ASM is not the best way to solve the problem. Most compilers allow you to use some assembler instructions in intrinsic form if you can't express them in C. The VS.NET2008 compiler for example exposes the 32*32=64 bit mul as __emul and the 64 bit shift as __ll_rshift.

使用内部函数可以重写功能的方式,在C编译器有机会了解发生了什么事情。这使得code到内联,寄存器分配,共同SUBEX pression消除和常量传播,可以同时完成。你会得到的巨大的在手写汇编code这样的性能提升。

Using intrinsics you can rewrite the function in a way that the C-compiler has a chance to understand what's going on. This allows the code to be inlined, register allocated, common subexpression elimination and constant propagation can be done as well. You'll get a huge performance improvement over the hand-written assembler code that way.

有关参考:最终的结果为定点MUL的VS.NET编译器是:

For reference: The end-result for the fixed-point mul for the VS.NET compiler is:

int inline FixedPointMul (int a, int b)
{
    return (int) __ll_rshift(__emul(a,b),16);
}

顺便说一句 - 定点除法的性能差异甚至更糟。我不得不改进了写一对夫妇的ASM-线10倍除法重定点code。

Btw - The performance difference of fixed point divides are even worse. I had improvements up to factor 10 for division heavy fixed point code by writing a couple of asm-lines.

编辑:

使用Visual C ++ 2013提供了thesame组装code这两种方法。

Using Visual c++ 2013 gives thesame assembly code for both ways.

这篇关于如果是汇编比C快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆