小型C程序(包括设置程序)的最小汇编指令数量是多少? [英] What is a reasonable minimum number of assembly instructions for a small C program including setup?

查看:47
本文介绍了小型C程序(包括设置程序)的最小汇编指令数量是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试生成最小的C程序,以查看通过运行该程序可以执行多少条指令.我禁用了库的使用并禁用了vdso.但是,我的C程序(gdb说的是7条汇编指令)最终根据性能统计信息执行了17k条指令.

I'm trying to generate the smallest C program possible to see how many instructions are executed by running it. I disabled use of libraries and disabled vdso. Yet, my C program, which gdb says is 7 assembly instructions, ends up executing 17k instructions according to perf stat.

这只是设置程序的正常数量的指令吗?据gdb称,来自ld-linux-x86-64.so.2的代码被映射到程序地址空间.鉴于我禁用了vdso并且不包含任何库,运行该程序是否需要该文件?这可能是17k指令的原因吗?

Is this a normal amount of instructions just to set up the program? According to gdb, code from ld-linux-x86-64.so.2 is mapped into the program address space. Given that I disabled vdso and am including no libraries, is this file necessary to run the program? Could this be the reason for the 17k instructions?

我的C程序 foo5.c

int main(){
    char* str = "Hello World";
    return 0;
}

我如何编译:

gcc -nostdlib -nodefaultlibs stubstart.S -o foo5 foo5.c

stubstart.S

.globl _start
_start:call main;
    movl $1, %eax; 
    xorl %ebx, %ebx; 
    int $0x80

性能统计信息输出:

Performance counter stats for './foo5':

              0.60 msec task-clock:u              #    0.015 CPUs utilized          
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
                11      page-faults:u             #    0.018 M/sec                  
            46,646      cycles:u                  #    0.077 GHz                    
            17,224      instructions:u            #    0.37  insn per cycle         
             5,145      branches:u                #    8.513 M/sec                  
               435      branch-misses:u           #    8.45% of all branches  

gdb 程序布局:

`/home/foo5', file type elf64-x86-64.
    Entry point: 0x5555555542b1
    0x0000555555554238 - 0x0000555555554254 is .interp
    0x0000555555554254 - 0x0000555555554278 is .note.gnu.build-id
    0x0000555555554278 - 0x0000555555554294 is .gnu.hash
    0x0000555555554298 - 0x00005555555542b0 is .dynsym
    0x00005555555542b0 - 0x00005555555542b1 is .dynstr
    0x00005555555542b1 - 0x00005555555542d5 is .text
    0x00005555555542d5 - 0x00005555555542e1 is .rodata
    0x00005555555542e4 - 0x00005555555542f8 is .eh_frame_hdr
    0x00005555555542f8 - 0x0000555555554330 is .eh_frame
    0x0000555555754f20 - 0x0000555555755000 is .dynamic
    0x00007ffff7dd51c8 - 0x00007ffff7dd51ec is .note.gnu.build-id in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd51f0 - 0x00007ffff7dd52c4 is .hash in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd52c8 - 0x00007ffff7dd53c0 is .gnu.hash in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd53c0 - 0x00007ffff7dd56f0 is .dynsym in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd56f0 - 0x00007ffff7dd5914 is .dynstr in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5914 - 0x00007ffff7dd5958 is .gnu.version in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5958 - 0x00007ffff7dd59fc is .gnu.version_d in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5a00 - 0x00007ffff7dd5dd8 is .rela.dyn in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5dd8 - 0x00007ffff7dd5e80 is .rela.plt in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5e80 - 0x00007ffff7dd5f00 is .plt in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5f00 - 0x00007ffff7dd5f08 is .plt.got in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7dd5f10 - 0x00007ffff7df4b20 is .text in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df4b20 - 0x00007ffff7df9140 is .rodata in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df9140 - 0x00007ffff7df9141 is .stapsdt.base in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df9144 - 0x00007ffff7df97b0 is .eh_frame_hdr in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7df97b0 - 0x00007ffff7dfbc24 is .eh_frame in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffc680 - 0x00007ffff7ffce64 is .data.rel.ro in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffce68 - 0x00007ffff7ffcfd8 is .dynamic in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffcfd8 - 0x00007ffff7ffcfe8 is .got in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffd000 - 0x00007ffff7ffd050 is .got.plt in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffd060 - 0x00007ffff7ffdfd8 is .data in /lib64/ld-linux-x86-64.so.2
    0x00007ffff7ffdfe0 - 0x00007ffff7ffe170 is .bss in /lib64/ld-linux-x86-64.so.2

更新:

最后,关于提供标准可执行文件而不是PIE来删除ld.so的小丑的评论,通过在gcc上添加-no-pie标志将perf指令的统计信息减少到12.然后old_timer的-O2建议进一步将其减少了到7!谢谢大家.

In the end, jester's comment about creating a standard executable instead of a PIE to remove the ld.so by adding the -no-pie flag to gcc reduced the perf instruction stat to 12. Then old_timer's -O2 suggestion further reduced it to 7! Thank you everyone.

更新2:使用-static的选定答案还将指令数从17k减少到12.极好的答案.

UPDATE 2: The selected answer of using -static also reduces the instruction count from 17k to 12. Excellent answer.

由评论者链接的文章是相关且有趣的

推荐答案

TL:DR: -static 不是默认值,使用它可以使ELF可执行文件仅运行您的<代码> _开始.

TL:DR: -static is not the default, use that to make an ELF executable that only runs your _start.

-no-pie -nostdlib 也会创建静态可执行文件,只是因为它是非PIE且没有动态库可链接.

-no-pie -nostdlib will also make a static executable simply because it's non-PIE and there are no dynamic libraries to link.

还有诸如 -static-pie 之类的东西,其中内核会将您的可执行文件加载到随机基址,但首先运行ld.so(我认为),但这不是您通过 -static 获得的.

There also is such a thing as -static-pie where the kernel will load your executable to a randomized base address but not run ld.so first (I think), but that's not what you get with -static.

需要明确的是,我们正在谈论动态指令数(在用户空间中,实际上执行了多少 perf stat -e指令:u ),而不是静态计数(作为可执行文件的一部分坐在磁盘/内存中的数量).静态计数只对循环内的指令计数一次,而对不执行的指令计数.

Just to be clear, we're talking about the dynamic instruction count (how many are actually executed in user-space, perf stat -e instructions:u), not a static count (how many are sitting on disk / in memory as part of the executable). A static count only counts instructions inside loops once, and still counts instructions that never execute.

或者至少我正在回答.这使得其他部分中的元数据与不执行的代码无关.

Or at least that's what I'm answering. That makes metadata in other sections, and code that doesn't execute irrelevant.

根据gdb,来自ld-linux-x86-64.so.2的代码被映射到程序地址空间.鉴于我禁用了vdso并且不包含任何库,运行该程序是否需要该文件?

According to gdb, code from ld-linux-x86-64.so.2 is mapped into the program address space. Given that I disabled vdso and am including no libraries, is this file necessary to run the program?

您仍然构建了位置无关的可执行文件(PIE).这是带有入口点的ELF共享对象,因此它仍然是动态链接的.因此,ld.so ELF解释器在其上运行.没有什么可做的,因为您实际上没有使用任何共享库,但是17k用户空间指令听起来不错.在我的Arch Linux系统(glibc 2.31)上,我为您的程序得到32606或7条指令.

You still built a position-independent executable (PIE). This is an ELF shared object with an entry point, so it's still dynamically linked. So the ld.so ELF interpreter runs on it. There's nothing for it to do because you don't actually use any shared libraries, but 17k user-space instructions sounds about right. I get 32606 or 7 instructions for your program on my Arch Linux system (glibc 2.31).

ld.so 作为二进制文件的解释器"启动,其启动方式类似于/bin/sh 的启动方式,以解释可执行文件的启动方式使用#!/bin/sh .(尽管Linux的ELF程序加载器仍然根据可执行文件的程序标头执行将程序段映射到内存的某些工作,所以ld.so不必通过系统调用手动完成.)

ld.so is started as an "interpreter" for your binary in a similar way to how /bin/sh is started to interpret an executable text file that starts with #!/bin/sh. (Although Linux's ELF program loader still does some of the work of mapping program segments into memory according to the program header of the executable, so ld.so doesn't have to do that manually with system calls.)

您可以通过在 gdb ./foo5 下运行并使用 starti 而不是 run 来停止此操作,以在第一条用户空间指令之前停止.您会看到自己在 ld.so _start 中.

You can see this by running under gdb ./foo5 and using starti instead of run to stop before the first user-space instruction. You'll see that you're in ld.so's _start.

Reading symbols from ./foo5...
(No debugging symbols found in ./foo5)
Cannot access memory at address 0x1024   ### note this isn't a real address,
                     ### just an offset relative to the base address / start of the file.
                     ### That's another clue this is a PIE
(gdb) starti

Program stopped.
0x00007ffff7fd3100 in _start () from /lib64/ld-linux-x86-64.so.2

您还可以运行 strace ./foo5 来查看其进行的系统调用,以表明发生了很多事情:

You can also run strace ./foo5 to see the system calls it makes, as an indication that there's a bunch of stuff happening:

$ strace ./foo5
execve("./foo5", ["./foo5"], 0x7ffc12394d90 /* 50 vars */) = 0
brk(NULL)                               = 0x55741b4b7000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffca69312b0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d4fc4b000
arch_prctl(ARCH_SET_FS, 0x7f1d4fc4ba80) = 0
mprotect(0x557419622000, 4096, PROT_READ) = 0
strace: [ Process PID=303809 runs in 32 bit mode. ]
exit(0)                                 = ?

(请注意,以32位模式运行";不是,但是strace检测到您使用的是32位 int $ 0x80 ABI,而不是常规的 syscall ld.so使用的ABI.)

(Note the "runs in 32 bit mode"; it doesn't, but strace detected that you used the 32-bit int $0x80 ABI instead of the normal syscall ABI that ld.so used.)

-nostdlib 用来表示 -static ,默认配置为不创建PIE.但是出于安全原因,现代发行版确实将GCC配置为制作PIE.请参阅 32位绝对地址x86-64 Linux?

-nostdlib used to imply -static, in GCC configured to not make PIEs by default. But modern distros do configure GCC to make PIEs for security reasons. See 32-bit absolute addresses no longer allowed in x86-64 Linux?

$ file foo5
foo5: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1ac0a9af247fefebde100695805e5b73f06e891c, not stripped

使用 -static 构建OTOH之后,

After building with -static, OTOH:

$ file foo5
foo5: ELF 64-bit LSB executable ...
$ perf stat --all-user ./foo5

 Performance counter stats for './foo5':

              0.03 msec task-clock                #    0.151 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                 1      page-faults               #    0.030 M/sec                  
             1,930      cycles                    #    0.058 GHz                    
                12      instructions              #    0.01  insn per cycle         
                 4      branches                  #    0.121 M/sec                  
                 0      branch-misses             #    0.00% of all branches        

       0.000219151 seconds time elapsed

       0.000284000 seconds user
       0.000000000 seconds sys

(奇怪的是,当您使用-all-user 时,perf不会为事件打印:u .我的系统具有/proc/sys/kernel/perf_event_paranoid = 0,所以,如果我不使用它,它还会计算内核内部执行的指令.运行之间的差异很大,但此静态可执行文件总共约60k.)

(Odd that perf doesn't print :u for the events when you use --all-user. My system has /proc/sys/kernel/perf_event_paranoid = 0 so if I don't use that, it also counts instructions executed inside the kernel. That varies significantly from run to run, but around 60k total for this static executable.)

我只计算11条执行的用户空间指令,但显然我的i7-6700k对该事件计数12条.(有硬件支持屏蔽任何事件计数器的用户,内核或同时屏蔽这两者.这是perf所使用的.)

I only count 11 user-space instructions that execute, but apparently my i7-6700k counts 12 for that event. (There is hardware support for masking user, kernel, or both for any event counter. This is what perf uses.)

GDB也确认成功:

Reading symbols from ./foo5...
(No debugging symbols found in ./foo5)
Cannot access memory at address 0x401024
(gdb) starti
Starting program: /tmp/foo5

Program stopped.
0x0000000000401000 in _start ()
(gdb) 

layout reg 的反汇编窗口显示:

│  >0x401000 <_start>       call   0x40100e <main>
│   0x401005 <_start+5>     mov    eax,0x1
│   0x40100a <_start+10>    xor    ebx,ebx
│   0x40100c <_start+12>    int    0x80
│   0x40100e <main>         push   rbp
│   0x40100f <main+1>       mov    rbp,rsp
│   0x401012 <main+4>       lea    rax,[rip+0xfe7]        # 0x402000
│   0x401019 <main+11>      mov    QWORD PTR [rbp-0x8],rax
│   0x40101d <main+15>      mov    eax,0x0
│   0x401022 <main+20>      pop    rbp
│   0x401023 <main+21>      ret

您本可以使用 -O2 进行编译,以将您的 main 优化到仅一个 xor eax,eax / ret 或根本不调用它,因此只需要执行3条用户空间指令即可.

You could have compiled with -O2 to optimize your main down to just an xor eax,eax / ret, or not call it at all so only 3 user-space instructions had to execute.

或者要在仍使用C的同时优化用户空间指令数,请参见

Or to optimize your user-space instruction count while still using C, see @mosvy's answer about writing _start in C, and an inline asm _exit(2) that can inline into it.)

请注意,尽管_start无法在函数调用之前将RSP正确地对齐16字节,但无法将argc和argv传递给main.(因为x86-64 SysV ABI保证进程进入在对齐堆栈的情况下发生).您可以通过移动负载和LEA来实现.请注意,由于您没有初始化libc,所以即使您静态链接了libc,也无法调用其功能.

Note that your _start fails to pass argc and argv to main, although it does have RSP properly 16-byte aligned before a function call. (Because the x86-64 SysV ABI guarantees process entry happens with the stack aligned). You could do that with a mov load and an LEA. Note that since you don't initialize libc, even if you statically linked libc you couldn't call its functions.

请参见如何使用内联获取参数值没有Glibc的C语言中的汇编语言?(基本上是在全局范围内用 asm()语句编写的独立的asm _start ,或者我的回答是完全违反了调用约定.)

See How Get arguments value using inline assembly in C without Glibc? for some hacks. (Basically stand-alone asm _start written in an asm() statement at global scope, or my answer is a total hack on the calling convention.)

这篇关于小型C程序(包括设置程序)的最小汇编指令数量是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆