如何C ++编译器合并相同的字符串字面量 [英] How Do C++ Compilers Merge Identical String Literals

查看:190
本文介绍了如何C ++编译器合并相同的字符串字面量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编译器(MS Visual C ++ 2010)如何在不同的cpp源文件中组合相同的字符串文字?例如,如果我在src1.cpp和src2.cpp中分别有字符串文字hello world\\\
。编译的exe文件只有1个hello world字符串文字,可能在常量/ readonly部分。这个任务是由链接器完成的吗?

How does compiler (MS Visual C++ 2010) combine identical string literals in different cpp source files? For example, if I have the string literal "hello world\n" in src1.cpp and src2.cpp respectively. The compiled exe file will have only 1 "hello world" string literal probably in the constant/readonly section. Is this task done by the linker?

我希望实现的是,我有一些模块在汇编中编写以供C ++模块使用。这些汇编模块包含许多长字符串字面定义。我知道字符串字面量与C ++源代码中的其他字符串字面量相同。如果我链接我的程序集生成的obj代码与编译器生成的obj代码,这些字符串文字是否会被链接器合并,以删除冗余字符串,因为所有模块都在C ++?

What I hope to achieve is that I got some modules written in assembly to be used by C++ modules. And these assembly modules contain many long string literal definitions. I know the string literals are identical to some other string literals in the C++ source. If I link my assembly generated obj code with the compiler generated obj code, would these string literals be merged by the linker to remove redundant strings as is the case when all modules are in C++?

推荐答案

(注意以下内容仅适用于MSVC)

(Note the following applies only to MSVC)

我的第一个答案是误导性的,因为我认为文字合并是链接器做的魔术(并且 / GF 标志只需要链接器)。

My first answer was misleading since I thought that the literal merging was magic done by the linker (and so that the /GF flag would only be needed by the linker).

但是,这是一个错误。结果是链接器没有特别涉及合并字符串字面量 - 发生的是当 / GF 选项给编译器时,它将字符串字面量放在COMDAT 部分,其对象名称基于字符串文字的内容。因此, compile 步骤需要 / GF 标志,而不是链接步骤。

However, that was a mistake. It turns out the linker has little special involvement in merging string literals - what happens is that when the /GF option is given to the compiler, it puts string literals in a "COMDAT" section of the object file with an object name that's based on the contents of the string literal. So the /GF flag is needed for the compile step, not for the link step.

当您使用 / GF 选项时,编译器将每个字符串文字放在单独部分的对象文件中作为COMDAT对象。具有相同名称的各种COMDAT对象将由链接器折叠(我不完全确定COMDAT的语义,或者如果具有相同名称的对象具有不同的数据,链接器可能会做什么)。因此,一个包含

When you use the /GF option, the compiler places each string literal in the object file in a separate section as a COMDAT object. The various COMDAT objects with the same name will be folded by the linker (I'm not exactly sure about the semantics of COMDAT, or what the linker might do if objects with the same name have different data). So a C file that contains

char* another_string = "this is a string";

在对象文件中将具有以下内容:

Will have something like the following in the object file:

SECTION HEADER #3
  .rdata name
       0 physical address
       0 virtual address
      11 size of raw data
     147 file pointer to raw data (00000147 to 00000157)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40301040 flags
         Initialized Data
         COMDAT; sym= "`string'" (??_C@_0BB@LFDAHJNG@this?5is?5a?5string?$AA@)
         4 byte align
         Read Only

RAW DATA #3
  00000000: 74 68 69 73 20 69 73 20 61 20 73 74 72 69 6E 67  this is a string
  00000010: 00      

重定位表将 another_string1 变量名称连接到文字数据。

with the relocation table wiring up the another_string1 variable name to the literal data.

请注意,字符串文字对象的名称显然是基于文字字符串的内容,但是有某种形式的改动。我怀疑在任何地方记录的碎片(但谁知道 - 也许有人有逆向工程)。

Note that the name of the string literal object is clearly based on the contents of the literal string, but with some sort of mangling. I doubt the mangling is documented anywhere (but who knows - maybe someone has reverse engineered it).

无论如何,如果你想要以同样的方式处理汇编文件中的文字,你需要安排文字放在目标文件中相同的方式。我真的不知道什么(如果有的话)汇编器可能具有的机制。将对象放在COMDAT部分可能很容易 - 将对象的名称基于字符串内容(以适当的方式加以区分)是另一个故事。

Anyway, if you want literals in an assembly file to be treated in the same manner, you'd need to arrange for the literals to be placed in the object file in the same manner. I honestly don't know what (if any) mechanism the assembler might have for that. Placing an object in a "COMDAT" section is probably pretty easy - getting the name of the object to be based on the string contents (and mangled in the appropriate manner) is another story.

除非有一些特别支持这个场景的程序集指令/关键字,我想你可能不幸运。当然可能有一个,但我已经足够生锈与 ml.exe 不知道,并快速看看吝啬的MSDN文档 ml.exe 没有跳出任何东西。

Unless there's some assembly directive/keyword that specifically supports this scenario, I think you might be out of luck. There certainly might be one, but I'm sufficiently rusty with ml.exe to have no idea, and a quick look at the skimpy MSDN docs for ml.exe didn't have anything jump out.

但是,如果你愿意把sting文字放在C文件中,通过externs在你的汇编代码中引用它们,它应该工作。但是,这实质上是Mark Ransom在他对这个问题的评论中所倡导的。

However, if you're willing to put the sting literals in a C file and refer to them in your assembly code via externs, it should work. However, that's essentially what Mark Ransom advocates in his comments to the question.

这篇关于如何C ++编译器合并相同的字符串字面量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆