clang编译器从相同来源产生不同的目标文件 [英] clang compiler produces different object files from same sources

查看:109
本文介绍了clang编译器从相同来源产生不同的目标文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的hello worldobjective-c lib:

I have a simple hello world objective-c lib:

hello.m:

#import <Foundation/Foundation.h>
#import "hello.h"

void sayHello()
{
    #ifdef FRENCH
    NSString *helloWorld = @"Hello World!\n";
    #else
    NSString *helloWorld = @"Bonjour Monde!\n";
    #endif
    NSFileHandle *stdout = [NSFileHandle fileHandleWithStandardOutput];
    NSData *strData = [helloWorld dataUsingEncoding: NSASCIIStringEncoding];
    [stdout writeData: strData];
}

hello.h文件如下所示:

the hello.h file looks like this:

int main (int argc, const char * argv[]);
int sum(int a, int b);
void sayHello();

这在osx和linux上使用clang和gcc编译就可以了。

This compiles just fine on osx and linux using clang and gcc.

现在我的问题:

在ubuntu上用clang对hello.m进行多次干净编译时,生成的hello.o可能会有所不同。这似乎与时间戳无关,因为即使经过一秒钟或更长时间,生成的.o文件也可以具有相同的校验和。从我幼稚的角度来看,这似乎是一种完全的随机/不可预防的行为。

When running a clean compile against hello.m multiple times with clang on ubuntu the generated hello.o can differ. This seems not related to a timestamp, because even after a second or more, the generated .o file can have the same checksum. From my naive point of view, this seems like a complete random/unpredicatable behaviour.

我用 -S 运行编译,以检查生成的汇编代码。汇编代码也有所不同(按预期)。可在此处找到比较汇编程序代码的差异文件: http://pastebin.com/uY1LERGX

I ran the compilation with the -Sto inspect the generated assembler code. The assembler code also differs (as expected). The diff file of comparing the assembler code can be found here: http://pastebin.com/uY1LERGX

乍一看,看起来汇编代码中的排序是不同的。

From a first look it just looks like the sorting is different in the assembler code.

在编译时不会发生这种情况它与海湾合作委员会。

This does not happen when compiling it with gcc.

有没有办法告诉clang生成与gcc完全相同的.o文件?

Is there a way to tell clang to generate exactly the same .o file like gcc does?

clang --version: 
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)


推荐答案

编译器始终生成相同代码时的功能称为可复制的生成或确定性的编译。

The feature when compiler always produce the same code is called Reproducible Builds or deterministic compilation.

ASLR可能是编译器输出不稳定的原因之一(地址空间布局随机化)。有时,编译器或它使用的某些库可能会读取并使用对象地址,例如,用作哈希或映射的键;或根据对象的地址对对象进行排序时。当编译器遍历哈希时,它将按照对象的地址顺序读取对象,而ASLR将以不同的顺序放置对象。这样的效果可能看起来像是您重新排序的符号(您的差异中的.quads)

One of possible sources of compiler's output instability is ASLR (Address space layout randomization). Sometimes compiler, or some libraries used by it, may read object address and use them, for example as keys of hashes or maps; or when sorting objects according to their addresses. When compiler is iterating over the hash, it will read objects in the order that depends on addresses of objects, and ASLR will place objects in different orders. The effect of such may looks like your reordered symbols (.quads in your diffs)

您可以使用 echo全局禁用Linux ASLR 0 | sudo tee / proc / sys / kernel / randomize_va_space 。在Linux中禁用ASLR的本地方法是

You can disable Linux ASLR globally with echo 0 | sudo tee /proc/sys/kernel/randomize_va_space. Local way of disabling ASLR in Linux is

 setarch `uname -m` -R /bin/bash`


setarch 的手册页说: -R,-addr-no-randomize;禁用虚拟地址空间的随机化(打开ADDR_NO_RANDOMIZE)。

对于OS X 10.6,存在 DYLD_NO_PIE 环境变量(检查 man dyld ,在bash export DYLD_NO_PIE = 1 中可能的用法);在10.7及更高版本中,有-no_pie build标志可用于构建LLVM本身或设置 _POSIX_SPAWN_DISABLE_ASLR 应该在启动llvm之前在 posix_spawnattr_setflags 中使用;或在10.7以上版本中使用脚本 http:// src.chromium.org/viewvc/chrome/trunk/src/build/mac/change_mach_o_flags.py 带有-no-pie 选项,可清除llvm二进制文件(感谢asan人)。

For OS X 10.6 there is DYLD_NO_PIE environment variable (check man dyld, possible usage in bash export DYLD_NO_PIE=1); in 10.7 and newer there is --no_pie build flag to be used in building the LLVM itself or by setting _POSIX_SPAWN_DISABLE_ASLR which should be used in posix_spawnattr_setflags before starting the llvm; or by using in 10.7+ the script http://src.chromium.org/viewvc/chrome/trunk/src/build/mac/change_mach_o_flags.py with --no-pie option to clear PIE flag from llvm binaries (thanks to asan people).

clang和llvm中存在一些错误,这些错误阻止/完全确定了它们,例如:

There were some errors in clang and llvm which prevents/prevented them to be completely deterministic, for example:

  • [cfe-dev] clang: not deterministic anymore? - Nov 3 2009, indeterminism was detected on code from LLVM bug 5355. Author says that indeterminism was present only with -g option enabled
  • [LLVMdev] Deterministic code generation and llvm::Iterators (2010)
  • [llvm-commits] Fix some TableGen non-deterministic behavior. (Sep 2012)
  • r196520 - Fix non-deterministic behavior. - SLPVectorizer was fixed into deterministic only at Dec 5, 2013 (replaced SmallSet with VectorSet)
  • 190793 - TableGen: give asm match classes deterministic order. "TableGen was sorting the entries in some of its internal data structures by pointer." - Sep 16, 2013
  • LLVM bug 14901 is the case when order of compiler warnings was Non-deterministic (Jan 2013).

来自14901的补丁包含有关不确定性迭代的注释在llvm :: DenseMap:

The patch from 14901 contains comments about non-deterministic iterating over llvm::DenseMap:

-  typedef llvm::DenseMap<const VarDecl *, std::pair<UsesVec*, bool> > UsesMap;
+  typedef std::pair<UsesVec*, bool> MappedType;
+  // Prefer using MapVector to DenseMap, so that iteration order will be
+  // the same as insertion order. This is needed to obtain a deterministic
+  // order of diagnostics when calling flushDiagnostics().
+  typedef llvm::MapVector<const VarDecl *, MappedType> UsesMap;
...
-    // FIXME: This iteration order, and thus the resulting diagnostic order,
-    //        is nondeterministic.

LLVM的文档说,一些内部容器具有不确定性和确定性变体,例如地图 MapVector trunk / docs / ProgrammersManual.rst

Documentation of LLVM says that there are non-deterministic and deterministic variants of several internal containers, like Map vs MapVector: trunk/docs/ProgrammersManual.rst:

1164    The difference between SetVector and other sets is that the order of iteration
1165    is guaranteed to match the order of insertion into the SetVector.  This property
1166    is really important for things like sets of pointers.  Because pointer values
1167    are non-deterministic (e.g. vary across runs of the program on different
1168    machines), iterating over the pointers in the set will not be in a well-defined
1169    order.
1170    
1171    The drawback of SetVector is that it requires twice as much space as a normal
1172    set and has the sum of constant factors from the set-like container and the
1173    sequential container that it uses.  Use it **only** if you need to iterate over
1174    the elements in a deterministic order. 

...

1277    StringMap iteratation order, however, is not guaranteed to be deterministic, so
1278    any uses which require that should instead use a std::map.
...

1364    ``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface.  The
1365    main difference is that the iteration order is guaranteed to be the insertion
1366    order, making it an easy (but somewhat expensive) solution for non-deterministic
1367    iteration over maps of pointers.

某些LLVM的作者可能认为,在他们的代码中不需要按迭代顺序保存确定性。例如,ARMTargetStreamer中有关于 ConstantPools MapVector 的用法的注释( ARMTargetStreamer.cpp-AssemblerConstantPools类)。但是我们如何确保所有不确定性容器(如DenseMap)的使用都不会影响编译器的输出?有数十个循环遍历DenseMap:" DenseMap。* const_iterator" regex在codesearch.debian.net

It is possible that some authors of LLVM thought that in their code there was no need to save determinism in iteration order. For example, there are comments in ARMTargetStreamer about usage of MapVector for ConstantPools (ARMTargetStreamer.cpp - class AssemblerConstantPools). But how can we sure that all usages of non-deterministic containers like DenseMap will not affect output of compiler? There are tens loops iterating over DenseMap: "DenseMap.*const_iterator" regex in codesearch.debian.net

您的LLVM和clang版本(3.0,来自 2011 -11-30)显然太旧了,无法从2012年到2013年,所有的确定性都有所提高(我的答案中列出了一些)。您应该更新LLVM和Clang,然后重新检查程序进行确定性编译,然后在更短且更容易重现的示例中定位非确定性(例如,从中间阶段保存bc-位码-),然后可以在LLVM bugzilla中发布错误。

Your version of LLVM and clang (3.0, from 2011-11-30) is clearly too old to have all determinism enhances from 2012 and 2013 years (some are listed in my answer). You should update your LLVM and Clang, then recheck your program for deterministic compilation, then locate non-determinism in shorter and easier to reproduce examples (e.g. save bc - bitcode - from middle stages), then you can post a bug in LLVM bugzilla.

这篇关于clang编译器从相同来源产生不同的目标文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆