汇编代码的长度可以指示执行速度? [英] Assembly code's length can indicate execution speed?

查看:246
本文介绍了汇编代码的长度可以指示执行速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在学习C,请考虑以下代码片段:

I'm learning C, consider the following code snippet:

#include <stdio.h>

int main(void) {
  int fahr;
  float calc;

  for (fahr = 300; fahr >= 0; fahr = fahr - 20) {
    calc = (5.0 / 9.0) * (fahr - 32);
    printf("%3d %6.1f\n", fahr, calc);
  }

  return 0;
}

这是从Celsius到Fahrenheit转换表从300到0。与:

Which is printing Celsius to Fahrenheit conversion table from 300 to 0. I compile this with:

$ clang -std=c11 -Wall -g -O3 -march=native main.c -o main

我也使用此命令生成汇编代码:

I also generate assembly code with this command:

$ clang -std=c11 -Wall -S -masm=intel -O3 -march=native main.c -o main

这是生成1.26kb文件和71行。

Which is generating 1.26kb file and 71 lines.

我稍微编辑了代码,在main()初始化的另一个函数:

I slightly edited the code and moved the logic into another function which is initalized at main():

#include <stdio.h>

void foo(void) {
  int fahr;
  float calc;

  for (fahr = 300; fahr >= 0; fahr = fahr - 20) {
    calc = (5.0 / 9.0) * (fahr - 32);
    printf("%3d %6.1f\n", fahr, calc);
  }
}

int main(void) {
  foo();
  return 0;
}

这将生成带有128行的2.33kb汇编代码。

This will generate 2.33kb assembly code with 128 lines.

使用时间运行两个程序./main 我看不到执行速度有什么不同。

Running both programs with time ./main I see no difference in execution speed.

我的问题是,有什么试图通过汇编代码的长度优化你的C程序吗?

My question is, does it matter anything trying to optimize your C programs by assembly code's length?

推荐答案

你正在比较GCC生成的 .S 文件的大小,因为这显然没有意义,我只是假装你正面对二进制大小的两个,GCC生成,代码片段。

It seems that you are comparing the sizes of the .S files generated by GCC, since that obviously make no sense, I'm just pretending you were confronting the binary size of two, GCC generated, code snippets.

在所有其他条件相同的情况下,较短的代码大小可能会提高速度(由于较高的代码密度),一般来说x86 CPU非常复杂需要在用于代码大小的优化和用于代码速度的优化之间的去耦合。

While, having all other conditions the same, a shorter code size may gives an increase in speed (due to an higher code density), in general x86 CPUs are complex enough to require a decoupling between optimizations for code size and optimizations for code speed.

具体来说,如果你的目标是代码速度,你应该优化代码速度。有时候这需要选择最短的片段,有时它不。

Specifically if you aim at code speed you should optimize for... code speed. Sometime this require choosing the shortest snippet, sometime it doesn't.

考虑编译器优化的典型例子,乘以2的幂:

Consider the classic example of compiler optimization, multiplication by powers of two:

int i = 4;
i = i * 8;

这可能会翻译得不好:

;NO optimizations at all

mov eax, 4        ;i = 4        B804000000       0-1 clocks
imul eax, 8       ;i = i * 8    6BC009           3 clocks
                  ;eax = i      8 bytes total    3-4 clocks total

;Slightly optimized
;4*8 gives no sign issue, we can use shl

mov eax, 4        ;i = 4        B804000000       0-1 clocks
shl eax, 3        ;i = i * 8    C1E003           1 clock
                  ;eax = i      8 bytes total    1-2 clocks total

两个代码段具有相同的代码长度,但第二个执行的速度是原来的两倍。

Both snippets have the same code length but the second performs nearly as twice as faster.

这是一个非常基本的示例 1 ,其中甚至不需要考虑微架构。

This is a very basic example1, where there is not even much need to take the micro-architecture into account.

另一个更微妙的例子是从Agner Fog讨论部分寄存器暂停 2 p>

Another more subtle example is the following, taken from Agner Fog discussion of Partial register stalls2:

;Version A                        Version B

mov al, byte ptr [mem8]           movzx ebx, byte ptr [mem8]
mov ebx, eax                      and eax, 0ffffff00h
                                  or ebx, eax

;7 bytes                           14 bytes

两种版本都给出相同的结果,但版本B 版本A 快5-6个时钟

Both versions give the same result but Version B is 5-6 clocks faster than Version A despite the former being twice the size of the latter.

答案是否,代码大小不够;

The answer is then no, code size is not enough; it may be a tie-breaker though.

如果您真的对优化装配感兴趣,您将会喜欢这两个读数:

If you really are interested into optimizing assembly you will enjoy these two readings:

  • Agner Fog's classics.
  • Intel optimization manual

第一个链接还有一个手册来优化C和C ++代码。

The first link also have a manual to optimize C and C++ code.

如果你写C,记住最有影响的优化是1)如何表示/存储数据,即数据结构2)如何处理数据,即算法。
有宏优化。

If you write in C remember that the most impacting optimizations are 1) How data is represented/stored, i.e. Data structures 2) How data is processed, i.e. Algorithms.
There are the macro optimizations.

考虑到生成的程序集正在转变为微优化,最有用的工具是1)智能编译器2)一组良好的内在函数 3

Taking into account the generated assembly is shifting into micro optimization and there the most useful tools are 1) A smart compiler 2) A good set of intrinsics3.

1 在实践中很容易优化。

2 现在可能有点过时了,但它的用途。

3 内置的非标准功能,可转换为特定的汇编指令。

1 So simple to be optimized out in practice.
2 Maybe a little obsolete now but it serves the purpose.
3 Built-in, non standard, functions that translate into specific assembly instructions.

这篇关于汇编代码的长度可以指示执行速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆