为什么运行一个空程序需要这么多指令? [英] Why does it take so many instructions to run an empty program?

查看:97
本文介绍了为什么运行一个空程序需要这么多指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以最近我了解了Linux中的perf命令.我决定进行一些实验,所以我创建了一个空的c程序,并测量了运行多少指令:

So recently I learned about the perf command in linux. I decided to run some experiments, so I created an empty c program and measured how many instructions it took to run:

echo 'int main(){}'>emptyprogram.c && gcc -O3 emptyprogram.c -o empty
perf stat ./empty

这是输出:

 Performance counter stats for './empty':

      0.341833      task-clock (msec)         #    0.678 CPUs utilized          
             0      context-switches          #    0.000 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
           112      page-faults               #    0.328 M/sec                  
     1,187,561      cycles                    #    3.474 GHz                    
     1,550,924      instructions              #    1.31  insn per cycle         
       293,281      branches                  #  857.966 M/sec                  
         4,942      branch-misses             #    1.69% of all branches        

   0.000504121 seconds time elapsed

为什么要使用这么多指令来运行实际上不执行任何操作的程序?我以为这可能是将程序加载到OS所需的一些基本指令,因此我寻找了用汇编语言编写的最小可执行文件,然后我发现一个142字节的可执行文件在此处输出"Hi World"( http://timelessname.com/elfbin/)

Why is it using so many instructions to run a program that does literally nothing? I thought that maybe this was some baseline number of instructions that are necessary to load a program into the OS, so I looked for a minimal executable written in assembly, and I found a 142 byte executable that outputs "Hi World" here (http://timelessname.com/elfbin/)

在142字节的hello可执行文件上运行perf stat,我得到:

Running perf stat on the 142 byte hello executable, I get:

Hi World

 Performance counter stats for './hello':

      0.069185      task-clock (msec)         #    0.203 CPUs utilized          
             0      context-switches          #    0.000 K/sec                  
             0      cpu-migrations            #    0.000 K/sec                  
             3      page-faults               #    0.043 M/sec                  
       126,942      cycles                    #    1.835 GHz                    
       116,492      instructions              #    0.92  insn per cycle         
        15,585      branches                  #  225.266 M/sec                  
         1,008      branch-misses             #    6.47% of all branches        

   0.000340627 seconds time elapsed

这似乎仍然比我预期的要高得多,但是我们可以接受它作为基准.在那种情况下,为什么运行empty需要多十倍的指令?这些指示是做什么的?而且,如果它们有某种开销,为什么C程序和helloworld汇编程序之间的开销会有如此大的差异?

This still seems a lot higher than I'd expect, but we can accept it as a baseline. In that case, why did running empty take 10x more instructions? What did those instructions do? And if they're some sort of overhead, why is there so much variation in overhead between a C program and the helloworld assembly program?

推荐答案

声称它实际上什么也没做"几乎是不公平的.是的,在应用程序级别,您选择使整个过程成为微基准测试的绝大部分,这很好.但是,不,在系统级别的幕后,这几乎是一无所有".您要求linux启动一个全新的执行环境,对其进行初始化,然后将其连接到该环境.您只调用了很少的glibc函数,但是动态链接是很简单的,在一百万条指令之后,您的过程就准备好要求故障printf()和朋友,并有效地引入您可能已针对或dlopen()的库.

It's hardly fair to claim that it "does literally nothing". Yes, at the app level you chose to make the whole thing a giant no-op for your microbenchmark, that's fine. But no, down beneath the covers at the system level, it's hardly "nothing". You asked linux to fork off a brand new execution environment, initialize it, and connect it to the environment. You called very few glibc functions, but dynamic linking is non-trivial and after a million instructions your process was ready to demand fault printf() and friends, and to efficiently bring in libs you might have linked against or dlopen()'ed.

这不是实现者可能要优化的那种微平台. 将要引起兴趣的是,如果您可以识别出在某些用例中从未使用过的fork/exec的昂贵"方面,因此可能会被#ifdef淘汰(或使其执行短路) ).懒惰地评估resolv.conf就是一个例子,如果它从未与IP服务器进行交互,则它永远不会由进程支付开销.

This is not the sort of microbench that implementors are likely to optimize against. What would be of interest is if you can identify "expensive" aspects of fork/exec that in some use cases are never used, and so might be #ifdef'd out (or have their execution short circuited) in very specific situations. Lazy evaluation of resolv.conf is one example of that, where the overhead is never paid by a process if it never interacts with IP servers.

这篇关于为什么运行一个空程序需要这么多指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆