C++ 编译器如何合并相同的字符串文字 [英] How Do C++ Compilers Merge Identical String Literals

查看:17
本文介绍了C++ 编译器如何合并相同的字符串文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编译器 (MS Visual C++ 2010) 如何在不同的 cpp 源文件中组合相同的字符串文字?例如,如果我分别在 src1.cpp 和 src2.cpp 中有字符串文字hello world ".编译后的 exe 文件可能在常量/只读部分只有 1 个hello world"字符串文字.这个任务是由链接器完成的吗?

How does compiler (MS Visual C++ 2010) combine identical string literals in different cpp source files? For example, if I have the string literal "hello world " in src1.cpp and src2.cpp respectively. The compiled exe file will have only 1 "hello world" string literal probably in the constant/readonly section. Is this task done by the linker?

我希望实现的是我得到了一些用汇编编写的模块供 C++ 模块使用.这些汇编模块包含许多长字符串文字定义.我知道字符串文字与 C++ 源代码中的其他一些字符串文字相同.如果我将我的程序集生成的 obj 代码与编译器生成的 obj 代码链接起来,这些字符串文字是否会被链接器合并以删除多余的字符串,就像所有模块都在 C++ 中的情况一样?

What I hope to achieve is that I got some modules written in assembly to be used by C++ modules. And these assembly modules contain many long string literal definitions. I know the string literals are identical to some other string literals in the C++ source. If I link my assembly generated obj code with the compiler generated obj code, would these string literals be merged by the linker to remove redundant strings as is the case when all modules are in C++?

推荐答案

(注意以下仅适用于 MSVC)

(Note the following applies only to MSVC)

我的第一个答案是误导,因为我认为文字合并是由链接器完成的魔术(因此只有链接器需要 /GF 标志).

My first answer was misleading since I thought that the literal merging was magic done by the linker (and so that the /GF flag would only be needed by the linker).

然而,这是一个错误.事实证明,链接器在合并字符串文字方面几乎没有特别参与 - 发生的情况是,当将 /GF 选项提供给编译器时,它会将字符串文字放入目标文件的COMDAT"部分具有基于字符串文字内容的对象名称.所以编译步骤需要/GF标志,而不是链接步骤.

However, that was a mistake. It turns out the linker has little special involvement in merging string literals - what happens is that when the /GF option is given to the compiler, it puts string literals in a "COMDAT" section of the object file with an object name that's based on the contents of the string literal. So the /GF flag is needed for the compile step, not for the link step.

当您使用 /GF 选项时,编译器将目标文件中的每个字符串文字作为 COMDAT 对象放置在单独的部分中.具有相同名称的各种 COMDAT 对象将被链接器折叠(我不确定 COMDAT 的语义,或者如果具有相同名称的对象具有不同的数据,链接器可能会做什么).所以一个包含

When you use the /GF option, the compiler places each string literal in the object file in a separate section as a COMDAT object. The various COMDAT objects with the same name will be folded by the linker (I'm not exactly sure about the semantics of COMDAT, or what the linker might do if objects with the same name have different data). So a C file that contains

char* another_string = "this is a string";

将在目标文件中包含如下内容:

Will have something like the following in the object file:

SECTION HEADER #3
  .rdata name
       0 physical address
       0 virtual address
      11 size of raw data
     147 file pointer to raw data (00000147 to 00000157)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40301040 flags
         Initialized Data
         COMDAT; sym= "`string'" (??_C@_0BB@LFDAHJNG@this?5is?5a?5string?$AA@)
         4 byte align
         Read Only

RAW DATA #3
  00000000: 74 68 69 73 20 69 73 20 61 20 73 74 72 69 6E 67  this is a string
  00000010: 00      

使用重定位表将 another_string1 变量名连接到文字数据.

with the relocation table wiring up the another_string1 variable name to the literal data.

请注意,字符串字面量对象的名称显然基于字面量字符串的内容,但进行了某种修改.维基百科 部分记录了重整方案(参见字符串常量").

Note that the name of the string literal object is clearly based on the contents of the literal string, but with some sort of mangling. The mangling scheme has been partially documented on Wikipedia (see "String constants").

无论如何,如果您希望以相同方式处理汇编文件中的文字,您需要以相同方式安排将文字放置在目标文件中.老实说,我不知道汇编程序可能具有什么(如果有)机制.将对象放入COMDAT"部分可能很容易——根据字符串内容(并以适当的方式修改)获取对象的名称是另一回事.

Anyway, if you want literals in an assembly file to be treated in the same manner, you'd need to arrange for the literals to be placed in the object file in the same manner. I honestly don't know what (if any) mechanism the assembler might have for that. Placing an object in a "COMDAT" section is probably pretty easy - getting the name of the object to be based on the string contents (and mangled in the appropriate manner) is another story.

除非有一些程序集指令/关键字专门支持这种情况,否则我认为您可能不走运.当然可能有一个,但我对 ml.exe 已经足够生疏了,不知道,快速浏览一下 ml.exe 的精简 MSDN 文档没有没有什么跳出来的.

Unless there's some assembly directive/keyword that specifically supports this scenario, I think you might be out of luck. There certainly might be one, but I'm sufficiently rusty with ml.exe to have no idea, and a quick look at the skimpy MSDN docs for ml.exe didn't have anything jump out.

但是,如果您愿意将 sting 文字放在 C 文件中并通过 extern 在您的汇编代码中引用它们,它应该可以工作.然而,这基本上是 Mark Ransom 在他对该问题的评论中所提倡的.

However, if you're willing to put the sting literals in a C file and refer to them in your assembly code via externs, it should work. However, that's essentially what Mark Ransom advocates in his comments to the question.

这篇关于C++ 编译器如何合并相同的字符串文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆