编译器是否总是产生汇编代码? [英] Does a compiler always produce an assembly code?

查看:203
本文介绍了编译器是否总是产生汇编代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 C ++中的思想-第1卷:

在第二遍中,代码生成器遍历解析树 并为该代码生成汇编语言代码 树的节点.

如果我们提供生成汇编代码的选项,至少在GCC中,编译器会通过创建一个包含汇编代码的文件来遵守. 但是,当我们简单地运行命令gcc而没有任何选项时,它不会在内部产生汇编代码吗?

如果是,那么为什么 首先要生成汇编代码 然后 机器语言?

解决方案

TL:DR不同的对象文件格式/更易于移植到新的Unix平台(从历史上看)是gcc保持汇编程序与其他程序分开的主要原因之一我认为是编译器.在gcc之外,主流的x86 C和C ++编译器(clang/LLVM,MSVC,ICC)直接通过 FreePascal可以直接在其一些目标平台上访问目标文件,否则只能访问asm.有很多主张( 1 2 4 ),几乎所有编译器都通过asm文本,但是对于许多使用最多的最大,使用最广泛的编译器(GCC除外),情况并非如此从事这些工作的开发人员中.

C编译器往往只针对单个平台(如供应商的微控制器编译器)并被编写为该平台的C实现",或者像LLVM这样的大型项目,其中包括机器代码生成并不是编译器自己的代码大小的很大一部分.较少使用的语言的编译器通常更可移植,但又不想编写自己的机器代码/目标文件处理程序. (如今,许多编译器都是LLVM的前端,因此像rustc一样免费获取.o输出,但较早的编译器没有该选项.)

在所有编译器中,大多数都使用asm.但是,如果按每天使用的频率加权,则直接转到可重定位的目标文件(.o/.obj)是全球任何一天进行的总构建的很大一部分.也就是说,如果您正在阅读此文件,那么您关心的编译器可能会以这种方式工作.

此外,针对便携式字节码格式的javac之类的编译器,使用asm的理由更少;在必须运行的每个平台上,相同的输出文件和字节码格式都可以工作.

相关:


为什么GCC会这样做

是的,as是一个单独的程序,gcc前端实际上是与cc1(生成文本asm的C预处理程序+编译器)分开运行的.

这使gcc稍微更具模块化,使编译器本身成为文本->文字程序.

GCC在内部为GIMPLE和RTL内部表示使用了一些二进制数据结构,但是除非您使用特殊的调试选项,否则它不会将这些IR格式(文本表示)写入文件.

那为什么要停止组装呢?这意味着GCC不需要为同一目标知道不同的目标文件格式.例如,不同的x86-64 OS使用ELF,PE/COFF,MachO64目标文件和历史上as将相同的文本asm组合到相同的机器代码中,并在不同目标上使用不同的目标文件元数据包围. (gcc必须知道一些细微的差别,例如是否在符号名称前加上_,是否可以使用32位绝对地址,以及代码是否必须为PIC.)

任何平台特定的问题都可以留给GNU binutils as(又名GAS)解决,或者gcc可以使用系统随附的供应商提供的汇编程序.

从历史上看,有许多不同的Unix系统具有不同的CPU,尤其是相同的CPU,但目标文件格式却有不同的特点.更重要的是,一组相当兼容的汇编程序伪指令,例如.globl main.asciiz "Hello World!\n"等. GAS语法来自Unix汇编程序.

过去,确实有可能将GCC移植到新的Unix平台而无需移植 ,仅使用操作系统随附的汇编器即可.

没人能将汇编程序作为库集成到GCC的cc1编译器中.这是为C预处理器完成的(历史上也曾在单独的进程中完成),但是没有完成汇编程序.


大多数其他编译器的确直接从编译器生成目标文件,而没有文本asm临时文件/管道.通常是因为编译器仅针对一个或几个目标而设计,例如MSVC或ICC或最初仅基于x86的各种编译器,或许多供应商提供的嵌入式芯片编译器.

clang/LLVM的设计要比GCC更新得多.它被设计为作为优化的JIT后端工作,因此它需要一个内置的汇编程序来快速生成机器代码.作为一个预先的编译器,添加对不同目标文件格式的支持大概是一件小事,因为内部软件体系结构可以直接用于二进制机器代码.

在寻找后端特定的优化之前,LLVM当然会在内部使用LLVM-IR进行与目标无关的优化,但是同样,如果您要求,它只会将该格式写为文本.


From Thinking in C++ - Vol 1:

In the second pass, the code generator walks through the parse tree and generates either assembly language code or machine code for the nodes of the tree.

Well at least in GCC if we give the option of generating the assembly code, the compiler obeys by creating a file containing assembly code. But, when we simply run the command gcc without any options does it not produce the assembly code internally?

If yes, then why does it need to first produce an assembly code and then translate it to machine language?

解决方案

TL:DR different object file formats / easier portability to new Unix platforms (historically) is one of the main reasons for gcc keeping the assembler separate from the compiler, I think. Outside of gcc, the mainstream x86 C and C++ compilers (clang/LLVM, MSVC, ICC) go straight to machine code, with the option of printing asm text if you ask them to.

LLVM and MSVC are / come with complete toolchains, not just compilers. (Also come with assembler and linker). LLVM already has object-file handling as a library function, so it can use that instead of writing out asm text to feed to a separate program.

Smaller projects often choose to leave object-file format details to the assembler. e.g. FreePascal can go straight to an object file on a few of its target platforms, but otherwise only to asm. There are many claims (1, 2, 3, 4) that almost all compilers go through asm text, but that's not true for many of the biggest most-widely-used compilers (except GCC) that have lots of developers working on them.

C compilers tend to either target a single platform only (like a vendor's compiler for a microcontroller) and were written as "the/a C implementation for this platform", or be very large projects like LLVM where including machine code generation isn't a big fraction of the compiler's own code size. Compilers for less widely used languages are more usually portable, but without wanting to write their own machine-code / object-file handling. (Many compilers these days are front-ends for LLVM, so get .o output for free, like rustc, but older compilers didn't have that option.)

Out of all compilers ever, most do go to asm. But if you weight by how often each one is used every day, going straight to a relocatable object file (.o / .obj) is significant fraction of the total builds done on any given day worldwide. i.e. the compiler you care about if you're reading this might well work this way.

Also, compilers like javac that target a portable bytecode format have less reason to use asm; the same output file and bytecode format work across every platform they have to run on.

Related:


Why GCC does what it does

Yes, as is a separate program that the gcc front-end actually runs separately from cc1 (the C preprocessor+compiler that produces text asm).

This makes gcc slightly more modular, making the compiler itself a text -> text program.

GCC internally uses some binary data structures for GIMPLE and RTL internal representations, but it doesn't write (text representations of) those IR formats to files unless you use a special option for debugging.

So why stop at assembly? This means GCC doesn't need to know about different object file formats for the same target. For example, different x86-64 OSes use ELF, PE/COFF, MachO64 object files, and historically a.out. as assembles the same text asm into the same machine code surrounded by different object file metadata on different targets. (There are minor differences gcc has to know about, like whether to prepend an _ to symbol names or not, and whether 32-bit absolute addresses can be used, and whether code has to be PIC.)

Any platform-specific quirks can be left to GNU binutils as (aka GAS), or gcc can use the vendor-supplied assembler that comes with a system.

Historically, there were many different Unix systems with different CPUs, or especially the same CPU but different quirks in their object file formats. And more importantly, a fairly compatible set of assembler directives like .globl main, .asciiz "Hello World!\n", and similar. GAS syntax comes from Unix assemblers.

It really was possible in the past to port GCC to a new Unix platform without porting as, just using the assembler that comes with the OS.

Nobody has ever gotten around to integrating an assembler as a library into GCC's cc1 compiler. That's been done for the C preprocessor (which historically was also done in a separate process), but not the assembler.


Most other compilers do produce object files directly from the compiler, without a text asm temporary file / pipe. Often because the compiler was only designed for one or a couple targets, like MSVC or ICC or various compilers that started out as x86-only, or many vendor-supplied compilers for embedded chips.

clang/LLVM was designed much more recently than GCC. It was designed to work as an optimizing JIT back-end, so it needed a built-in assembler to make it fast to generate machine code. To work as an ahead-of-time compiler, adding support for different object-file formats was presumably a minor thing since the internal software architecture was there to go straight to binary machine code.

LLVM of course uses LLVM-IR internally for target-independent optimizations before looking for back-end-specific optimizations, but again it only writes out this format as text if you ask it to.


这篇关于编译器是否总是产生汇编代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆