编译器总是生成汇编代码吗? [英] Does a compiler always produce an assembly code?

查看:25
本文介绍了编译器总是生成汇编代码吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自用 C++ 思考 - 第 1 卷:

<块引用>

在第二遍中,代码生成器遍历解析树并生成汇编语言代码树的节点.

至少在 GCC 中,如果我们提供生成汇编代码的选项,编译器会通过创建包含汇编代码的文件来遵守.但是,当我们只运行命令 gcc 而没有任何选项时,它是否不会在内部生成汇编代码?

如果是,那么为什么需要先生成汇编代码然后将其翻译成机器语言?

解决方案

TL:DR 不同的目标文件格式/更容易移植到新的 Unix 平台(历史上)是 gcc 将汇编器与 在 gcc 之外,主流的 x86 C 和 C++ 编译器(clang/LLVM、MSVC、ICC)直接使用机器代码,如果您要求他们打印 asm 文本的选项.

LLVM 和 MSVC 带有完整的工具链,而不仅仅是编译器.(还带有汇编器和链接器).LLVM 已经将对象文件处理作为一个库函数,因此它可以使用它而不是写出 asm 文本来提供给单独的程序.

较小的项目通常选择将目标文件格式的详细信息留给汇编程序.例如FreePascal 可以直接访问其一些目标平台上的目标文件,否则只能访问 asm.有很多说法(12, 3, 4),几乎所有编译器都通过 asm 文本,但对于许多最大的最广泛使用的编译器(GCC 除外)而言,情况并非如此,它们有很多为它们工作的开发人员.

C 编译器倾向于仅针对单个平台(例如供应商的微控制器编译器)并被编写为该平台的/一个 C 实现",或者是非常大的项目,例如 LLVM,其中包括机器代码生成不是编译器自己代码大小的很大一部分.不太广泛使用的语言的编译器通常更易于移植,但不想编写自己的机器代码/目标文件处理.(如今许多编译器都是 LLVM 的前端,因此可以免费获得 .o 输出,例如 rustc,但较旧的编译器没有该选项.)

在所有编译器中,大多数都使用 asm.但是,如果您按每个人每天使用的频率进行加权,则直接进入可重定位的目标文件 (.o/.obj) 是完成的总构建的重要部分在世界范围内的任何一天.也就是说,如果您正在阅读本文,您关心的编译器可能会以这种方式工作.

此外,像 javac 这样以可移植字节码格式为目标的编译器没有理由使用 asm;相同的输出文件和字节码格式适用于它们必须运行的每个平台.

相关:


为什么 GCC 这样做

是的,as 是一个单独的程序,gcc 前端实际上与 cc1(产生文本汇编).

这使得 gcc 稍微更加模块化,使编译器本身成为一个文本 ->文本程序.

GCC 在内部将一些二进制数据结构用于 GIMPLE 和 RTL 内部表示,但它不会将这些 IR 格式(的文本表示)写入文件,除非您使用特殊选项进行调试.

那么为什么要停止组装呢?这意味着 GCC 不需要了解同一目标的不同目标文件格式.例如,不同的 x86-64 操作系统使用 ELF、PE/COFF、MachO64 目标文件,以及从历史上看as 将相同的文本 asm 组装成相同的机器代码,由不同目标上的不同目标文件元数据包围.(gcc 需要了解一些细微的差异,例如是否在符号名称前添加 _,是否可以使用 32 位绝对地址,以及代码是否必须是 PIC.)

任何特定于平台的怪癖都可以留给 GNU binutils as(又名 GAS),或者 gcc 可以使用系统附带的供应商提供的汇编程序.

历史上,有许多不同的 Unix 系统具有不同的 CPU,或者特别是相同的 CPU,但它们的目标文件格式不同.更重要的是,一组相当兼容的汇编指令,如 .globl main.asciizHello World! " 等.GAS 语法来自 Unix 汇编器.

过去确实可以将 GCC 移植到新的 Unix 平台而无需移植 as,只需使用操作系统附带的汇编程序即可.

没有人开始将汇编程序作为库集成到 GCC 的 cc1 编译器中.这是为 C 预处理器完成的(历史上也是在单独的进程中完成的),但没有为汇编器完成.


大多数其他编译器确实直接从编译器生成目标文件,没有文本 asm 临时文件/管道.通常是因为编译器仅针对一个或几个目标而设计,例如 MSVC 或 ICC 或最初仅用于 x86 的各种编译器,或许多供应商提供的用于嵌入式芯片的编译器.

clang/LLVM 的设计比 GCC 新得多.它被设计为一个优化的 JIT 后端,所以它需要一个内置的汇编程序来快速生成机器代码.要作为提前编译器工作,添加对不同目标文件格式的支持大概是一件小事,因为内部软件架构是直接使用二进制机器代码的.

LLVM 在寻找特定于后端的优化之前,当然在内部使用 LLVM-IR 进行目标无关的优化,但同样,如果您要求,它只会将此格式写为文本.


From Thinking in C++ - Vol 1:

In the second pass, the code generator walks through the parse tree and generates either assembly language code or machine code for the nodes of the tree.

Well at least in GCC if we give the option of generating the assembly code, the compiler obeys by creating a file containing assembly code. But, when we simply run the command gcc without any options does it not produce the assembly code internally?

If yes, then why does it need to first produce an assembly code and then translate it to machine language?

解决方案

TL:DR different object file formats / easier portability to new Unix platforms (historically) is one of the main reasons for gcc keeping the assembler separate from the compiler, I think. Outside of gcc, the mainstream x86 C and C++ compilers (clang/LLVM, MSVC, ICC) go straight to machine code, with the option of printing asm text if you ask them to.

LLVM and MSVC are / come with complete toolchains, not just compilers. (Also come with assembler and linker). LLVM already has object-file handling as a library function, so it can use that instead of writing out asm text to feed to a separate program.

Smaller projects often choose to leave object-file format details to the assembler. e.g. FreePascal can go straight to an object file on a few of its target platforms, but otherwise only to asm. There are many claims (1, 2, 3, 4) that almost all compilers go through asm text, but that's not true for many of the biggest most-widely-used compilers (except GCC) that have lots of developers working on them.

C compilers tend to either target a single platform only (like a vendor's compiler for a microcontroller) and were written as "the/a C implementation for this platform", or be very large projects like LLVM where including machine code generation isn't a big fraction of the compiler's own code size. Compilers for less widely used languages are more usually portable, but without wanting to write their own machine-code / object-file handling. (Many compilers these days are front-ends for LLVM, so get .o output for free, like rustc, but older compilers didn't have that option.)

Out of all compilers ever, most do go to asm. But if you weight by how often each one is used every day, going straight to a relocatable object file (.o / .obj) is significant fraction of the total builds done on any given day worldwide. i.e. the compiler you care about if you're reading this might well work this way.

Also, compilers like javac that target a portable bytecode format have less reason to use asm; the same output file and bytecode format work across every platform they have to run on.

Related:


Why GCC does what it does

Yes, as is a separate program that the gcc front-end actually runs separately from cc1 (the C preprocessor+compiler that produces text asm).

This makes gcc slightly more modular, making the compiler itself a text -> text program.

GCC internally uses some binary data structures for GIMPLE and RTL internal representations, but it doesn't write (text representations of) those IR formats to files unless you use a special option for debugging.

So why stop at assembly? This means GCC doesn't need to know about different object file formats for the same target. For example, different x86-64 OSes use ELF, PE/COFF, MachO64 object files, and historically a.out. as assembles the same text asm into the same machine code surrounded by different object file metadata on different targets. (There are minor differences gcc has to know about, like whether to prepend an _ to symbol names or not, and whether 32-bit absolute addresses can be used, and whether code has to be PIC.)

Any platform-specific quirks can be left to GNU binutils as (aka GAS), or gcc can use the vendor-supplied assembler that comes with a system.

Historically, there were many different Unix systems with different CPUs, or especially the same CPU but different quirks in their object file formats. And more importantly, a fairly compatible set of assembler directives like .globl main, .asciiz "Hello World! ", and similar. GAS syntax comes from Unix assemblers.

It really was possible in the past to port GCC to a new Unix platform without porting as, just using the assembler that comes with the OS.

Nobody has ever gotten around to integrating an assembler as a library into GCC's cc1 compiler. That's been done for the C preprocessor (which historically was also done in a separate process), but not the assembler.


Most other compilers do produce object files directly from the compiler, without a text asm temporary file / pipe. Often because the compiler was only designed for one or a couple targets, like MSVC or ICC or various compilers that started out as x86-only, or many vendor-supplied compilers for embedded chips.

clang/LLVM was designed much more recently than GCC. It was designed to work as an optimizing JIT back-end, so it needed a built-in assembler to make it fast to generate machine code. To work as an ahead-of-time compiler, adding support for different object-file formats was presumably a minor thing since the internal software architecture was there to go straight to binary machine code.

LLVM of course uses LLVM-IR internally for target-independent optimizations before looking for back-end-specific optimizations, but again it only writes out this format as text if you ask it to.


这篇关于编译器总是生成汇编代码吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆