如何查找全局静态初始化 [英] How to find global static initializations

查看:269
本文介绍了如何查找全局静态初始化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚读了这篇精彩的文章: http:// neugierig .org / software / chromium / notes / 2011/08 / static-initializers.html
,然后尝试: https://gcc.gnu.org/onlinedocs/gccint/Initialization.html



其内容查找初始值设置不适用于我。 .ctors 节不可用,但我可以找到 .init_array (另请参阅不能在二进制文件中找到.dtors和.ctors )。但是如何解释输出?我的意思是,总结页面的大小也可以通过 size 命令及其 .bss 列来处理。 c> c>不会报告任何 * _ GLOBAL__I_ * 符号,只有 * _ GLOBAL__N _ * 函数,和 - 更有趣 - _GLOBAL__sub_I_somefile.cpp 。后者可能指示具有全局初始化的文件。但是我能以某种方式获得正在运行的构造函数列表吗?理想情况下,一个工具会给我一个列表:

  Foo :: Foo in file1.cpp:12 
Bar: :bar in file2.cpp:45
...

可用)。有这样的工具吗?如果不是,怎么能写呢? .init_array 部分是否包含可以通过一些DWARF魔法转换到上述代码的指针?

解决方案

正如你已经观察到的,编译器/初始化函数的实现细节是高度编译器(版本)相关的。虽然我不知道这个工具,目前的GCC / clang版本做的是足够简单,让一个小脚本做的工作: .init_array 只是一个列表入口点。 objdump -s 可用于加载列表, nm 可查找符号名称。这里有一个Python脚本。它应该适用于由所述编译器生成的任何二进制文件:

 #!/ usr / bin / env python 
import os
import sys

#Load .init_array section
objdump_output = os.popen(objdump -s'%s'-j。 objdump_output中的
is_64bit =x86-64
init_array = objdump_output [objdump_output.find(.init_array:)+ 33:]
initializers = []
在init_array.split(\\\
)中的行:
= line.split()
如果不是零件:
continue
parts.pop(0)#删除偏移
parts.pop(-1)#删除ascii表示

如果is_64bit:
#64位指针是8字节长
parts = [.join(parts [i:i + 2])for i in range(0,len ),2)]

#修复字节序。
parts = [.join(reverse([x [i:i + 2] for i in range(0,len ,2)]))for x in parts]

initializers + = parts

#为c ++构造函数加载反汇编
dis_output = os.popen(objdump - d'%s'| c ++ filt%(sys.argv [1] .replace(',r\'),))read()
def find_associated_constructor(disassembly,symbol):
#Find associated __static_initialization function
loc = disassembly.find(<%s>%symbol)
如果loc< 0:
return False
loc = disassembly.find (<,loc)
if loc< 0:
return False
symbol = disassembly [loc + 2:disassembly.find(\\\
,loc) :-1]
如果符号[:23]!=__static_initialization:
return False
address = disassembly [disassembly.rfind(,0,loc)+1:loc]
loc = disassembly.find(%s<%s>%(地址,符号))
如果loc< 0:
return False
#查找所有callq在该函数中
end_of_function = disassembly.find(\\\
\\\
,loc)
symbols = []
while loc loc = find(callq,loc)
if loc< 0或loc> end_of_function:
break
loc = disassembly.find(<,loc)
.append(disassembly [loc + 1:disassembly.find(\\\
,loc)] [: - 1])$ ​​b $ b返回符号

#加载符号名称b $ b nm_output = os.popen(nm'%s'%(sys.argv [1] .replace(',r\'),))read()
nm_symbols = {}
对于nm_output.split(\\\
)中的行:
parts = line.split()
如果不是部分:
continue
nm_symbols [b]



#输出初始化器列表
print(Initializers:)
初始化器中的初始化器:
symbol = nm_symbols [initializer] if initializer in nm_symbols else???
constructor = find_associated_constructor(dis_output,symbol)
如果构造函数:
在构造函数中的函数:
print %s%s - > %s%(初始化符号,符号,函数))
else:
print(%s%s%(initializer,symbol))
pre>

不是直接调用C ++静态初始化器,而是通过两个生成的函数 _GLOBAL__sub_I _ .. __ static_initialization .. 。脚本使用这些函数的反汇编来获取实际构造函数的名字。你需要 c ++ filt

共享库可以有自己的初始化器列表,这不会情况稍微复杂一些:对于非静态初始化器, .init_array 得到一个全零条目,被覆盖的最终地址在初始化时加载库,所以这个脚本会输出一个带有全零的地址。


I just read this excellent article: http://neugierig.org/software/chromium/notes/2011/08/static-initializers.html and then I tried: https://gcc.gnu.org/onlinedocs/gccint/Initialization.html

What it says about finding initializers does not work for me though. The .ctors section is not available, but I could find .init_array (see also Can't find .dtors and .ctors in binary). But how do I interpret the output? I mean, summing up the size of the pages can also be handled by the size command and its .bss column - or am I missing something?

Furthermore, nm does not report any *_GLOBAL__I_* symbols, only *_GLOBAL__N_* functions, and - more interesting - _GLOBAL__sub_I_somefile.cpp entries. The latter probably indicates files with global initialization. But can I somehow get a list of constructors that are being run? Ideally, a tool would give me a list of

Foo::Foo in file1.cpp:12
Bar::Bar in file2.cpp:45
...

(assuming I have debug symbols available). Is there such a tool? If not, how could one write it? Does the .init_array section contain pointers to code which could be translated via some DWARF magic to the above?

解决方案

As you already observed, the implementation details of contructors/initialization functions are highly compiler (version) dependent. While I am not aware of a tool for this, what current GCC/clang versions do is simple enough to let a small script do the job: .init_array is just a list of entry points. objdump -s can be used to load the list, and nm to lookup the symbol names. Here's a Python script that does that. It should work for any binary that was generated by the said compilers:

#!/usr/bin/env python
import os
import sys

# Load .init_array section
objdump_output = os.popen("objdump -s '%s' -j .init_array" % (sys.argv[1].replace("'", r"\'"),)).read()
is_64bit = "x86-64" in objdump_output
init_array = objdump_output[objdump_output.find("Contents of section .init_array:") + 33:]
initializers = []
for line in init_array.split("\n"):
    parts = line.split()
    if not parts:
        continue
    parts.pop(0)  # Remove offset
    parts.pop(-1) # Remove ascii representation

    if is_64bit:
        # 64bit pointers are 8 bytes long
        parts = [ "".join(parts[i:i+2]) for i in range(0, len(parts), 2) ]

    # Fix endianess
    parts = [ "".join(reversed([ x[i:i+2] for i in range(0, len(x), 2) ])) for x in parts ]

    initializers += parts

# Load disassembly for c++ constructors
dis_output = os.popen("objdump -d '%s' | c++filt" % (sys.argv[1].replace("'", r"\'"), )).read()
def find_associated_constructor(disassembly, symbol):
    # Find associated __static_initialization function
    loc = disassembly.find("<%s>" % symbol)
    if loc < 0:
        return False
    loc = disassembly.find(" <", loc)
    if loc < 0:
        return False
    symbol = disassembly[loc+2:disassembly.find("\n", loc)][:-1]
    if symbol[:23] != "__static_initialization":
        return False
    address = disassembly[disassembly.rfind(" ", 0, loc)+1:loc]
    loc = disassembly.find("%s <%s>" % (address, symbol))
    if loc < 0:
        return False
    # Find all callq's in that function
    end_of_function = disassembly.find("\n\n", loc)
    symbols = []
    while loc < end_of_function:
        loc = disassembly.find("callq", loc)
        if loc < 0 or loc > end_of_function:
            break
        loc = disassembly.find("<", loc)
        symbols.append(disassembly[loc+1:disassembly.find("\n", loc)][:-1])
    return symbols

# Load symbol names, if available
nm_output = os.popen("nm '%s'" % (sys.argv[1].replace("'", r"\'"), )).read()
nm_symbols = {}
for line in nm_output.split("\n"):
    parts = line.split()
    if not parts:
        continue
    nm_symbols[parts[0]] = parts[-1]

# Output a list of initializers
print("Initializers:")
for initializer in initializers:
    symbol = nm_symbols[initializer] if initializer in nm_symbols else "???"
    constructor = find_associated_constructor(dis_output, symbol)
    if constructor:
        for function in constructor:
            print("%s %s -> %s" % (initializer, symbol, function))
    else:
        print("%s %s" % (initializer, symbol))

C++ static initializers are not called directly, but through two generated functions, _GLOBAL__sub_I_.. and __static_initialization... The script uses the disassembly of those functions to get the name of the actual constructor. You'll need the c++filt tool to unmangle the names, or remove the call from the script to see the raw symbol name.

Shared libraries can have their own initializer lists, which would not be displayed by this script. The situation is slightly more complicated there: For non-static initializers, the .init_array gets an all-zero entry that is overwritten with the final address of the initializer when loading the library. So this script would output an address with all zeros.

这篇关于如何查找全局静态初始化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆