Segfault声明一个类型为vector的变量< shared_ptr< int>> [英] Segfault on declaring a variable of type vector<shared_ptr<int>>

查看:459
本文介绍了Segfault声明一个类型为vector的变量< shared_ptr< int>>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代码



这是给出段错误的程序。

  #include< iostream> 
#include< vector>
#include< memory>

int main()
{
std :: cout<< Hello World<<的std :: ENDL;

std :: vector< std :: shared_ptr< int>> y {};

std :: cout<< Hello World<<的std :: ENDL;
}

当然,绝对没有错程序本身。 segfault的根本原因取决于其构建和运行的环境。




背景



我们在亚马逊使用build系统在几乎独立于机器的独立环境中构建和部署二进制文件( lib bin )办法。对于我们的情况,这基本上意味着它将可执行文件(由上述程序构建)部署到 $ project_dir / build / bin / 几乎所有的依赖关系(即共享库)放入 $ project_dir / build / lib / 中。为什么我使用几乎这个短语是因为对于共享库,例如 libc.so libm.so ld-linux-x86-64.so.2 ,可能还有其他几个,可执行文件从系统中选取(即从 / lib64 )。注意,假设可以从 $ project_dir / build / lib 中选择 libstdc ++



现在我按如下方式运行它:

  $ LD_LIBRARY_PATH = $ project_dir / build / lib ./build/bin/run 

分段错误

但是,如果我运行它,而不设置 LD_LIBRARY_PATH 。它运行良好。




诊断



ldd



以下是两种情况下的 ldd 信息(请注意,我编辑了输出以提及 full 版本的库不管有什么区别

  $ LD_LIBRARY_PATH = $ project_dir / build / lib ldd ./build/bin/run 

linux-vdso.so.1 => (0x00007ffce19ca000)
libstdc ++。so.6 => $ project_dir / build / lib / libstdc ++。so.6.0.20
libgcc_s.so.1 => $ project_dir / build / lib / libgcc_s.so.1
libc.so.6 => /lib64/libc.so.6
libm.so.6 => /lib64/libm.so.6
/lib64/ld-linux-x86-64.so.2(0x0000562ec51bc000)

且不含LD_LIBRARY_PATH:

  $ ldd ./build/bin/run 

linux-vdso.so.1 => (0x00007fffcedde000)
libstdc ++。so.6 => /usr/lib64/libstdc++.so.6.0.16
libgcc_s.so.1 => /lib64/libgcc_s-4.4.6-20110824.so.1
libc.so.6 => /lib64/libc.so.6
libm.so.6 => /lib64/libm.so.6
/lib64/ld-linux-x86-64.so.2(0x0000560caff38000)



2。 gdb当它发生故障时



 编程接收到的信号SIGSEGV,分段故障。 
0x00007ffff7dea45c来自/lib64/ld-linux-x86-64.so.2 $ b $的_dl_fixup()缺少单独的debuginfos,请使用:debuginfo-install glibc-2.12-1.209.62.al12.x86_64
(gdb)bt
#0 0x00007ffff7dea45c来自/lib64/ld-linux-x86-64.so.2中的_dl_fixup()
#1 0x00007ffff7df0c55来自/ lib64 / ld-linux的_dl_runtime_resolve() -x86-64.so.2
#2 0x00007ffff7b1dc41在std :: locale :: _ S_initialize()()从$ project_dir / build / lib / libstdc ++。so.6
#3 0x00007ffff7b1dc85 std: :locale :: locale()()from $ project_dir / build / lib / libstdc ++。so.6
#4 0x00007ffff7b1a574 in std :: ios_base :: Init :: Init()()from $ project_dir / build / lib / libstdc ++。so.6
#5 0x0000000000400fde in _GLOBAL__sub_I_main()at $ project_dir / build / gcc-4.9.4 / include / c ++ / 4.9.4 / iostream:74
#6 0x_00000000004012ed __libc_csu_init ()
#7 0x00007ffff7518cb0 in __libc_start_main()from /lib64/libc.so.6
#8 0x0000000000401021 in _start()
(gdb)



3。 LD_DEBUG = all



我也尝试通过为段错误情况启用 LD_DEBUG = all 来查看链接器信息。我发现了一些可疑的东西,因为它搜索 pthread_once 符号,当它找不到它时,它会给出段错误(这是我对以下输出代码片段的解释):

 初始化程序:$ project_dir / build / bin / run 

symbol = _ZNSt8ios_base4InitC1Ev;在文件中查找= $ project_dir / build / bin / run [0]
symbol = _ZNSt8ios_base4InitC1Ev;查找文件= $ project_dir / build / lib / libstdc ++。so.6 [0]
绑定文件$ project_dir / build / bin / run [0] to $ project_dir / build / lib / libstdc ++。so.6 [ 0]:普通符号`_ZNSt8ios_base4InitC1Ev'[GLIBCXX_3.4]
symbol = _ZNSt6localeC1Ev;在文件中查找= $ project_dir / build / bin / run [0]
symbol = _ZNSt6localeC1Ev;查找文件= $ project_dir / build / lib / libstdc ++。so.6 [0]
绑定文件$ project_dir / build / lib / libstdc ++。so.6 [0]到$ project_dir / build / lib / libstdc ++。 so.6 [0]:普通符号`_ZNSt6localeC1Ev'[GLIBCXX_3.4]
symbol = pthread_once;在文件中查找= $ project_dir / build / bin / run [0]
symbol = pthread_once;在文件中查找= $ project_dir / build / lib / libstdc ++。so.6 [0]
symbol = pthread_once;在文件中查找= $ project_dir / build / lib / libgcc_s.so.1
symbol = pthread_once;在file = / lib64 / libc.so.6 [0]中查找
symbol = pthread_once;在file = / lib64 / libm.so.6 [0]中查找
symbol = pthread_once; lookup in file = / lib64 / ld-linux-x86-64.so.2 [0]

但是当它运行成功时,我没有看到 pthread_once






问题



我知道这样很难调试,可能我没有提供关于环境和所有信息的很多信息。但是,我的问题仍然是:这个段错误可能是什么原因?如何进一步调试并找到?一旦我发现问题,修复会很容易。




编译器和平台



我在RHEL5上使用 GCC 4.9




实验



E#1



如果我评论以下行:

  std :: vector< std :: shared_ptr< ; INT>> y {}; 

编译并运行正常!



E#2



我在程序中加入了以下标题:

  #include< boost / filesystem.hpp> 

并相应链接。现在它没有任何段错误。因此,似乎通过依赖 libboost_system.so.1.53.0。,满足了一些要求,或者避免了问题!



E#3



因为当我将可执行文件链接到 libboost_system.so.1.53时, 0 ,所以我一步步做了以下事情。



不使用 #include< boost / filesystem .hpp> 在代码本身中,我使用原始代码并通过使用 LD_PRELOAD 预加载 libboost_system.so / code>如下:

  $ LD_PRELOAD = $ project_dir / build / lib / libboost_system.so $ project_dir / build / bin / run 

并成功运行!



接下来,我在 libboost_system.so 上做了 ldd 它提供了一个库的列表,其中两个是:

$ p $ /lib64/librt.so.1
/ lib64 /libpthread.so.0

因此,不要预加载 libboost_system librt libpthread

  $ LD_PRELOAD = / lib64 / librt.so.1 $ project_dir / build / bin / run 

$ LD_PRELOAD = / lib64 / libpthread.so.0 $ project_dir / build / bin / run

在这两种情况下,它都能成功运行。现在我的结论是,通过加载 librt libpthread 两者),就会遇到一些要求或者规避问题!尽管如此,我仍然不知道问题的根源。




编译和链接选项



由于构建系统非常复杂,并且有很多选项默认存在。所以我试图使用CMake的 set 命令显式地添加 -lpthread ,然后它就起作用了,正如我们已经看到的那样预加载 libpthread 它的工作原理!


$ b 为了看到 build 这两种情况之间的区别( when-it-works when-it-given-segfault ),我将它构建在 verbose 模式,将 -v 传递给GCC,以查看编译阶段和它实际传递给 cc1plus (编译器)的选项。和 collect2 (链接器)。
$ b

请注意,为简洁起见,使用美元符号和虚拟路径编辑路径。)




$ / gcc-4.9.4 / cc1plus -quiet -v -I / a / include -I / b / include -iprefix
$ / gcc-4.9。 4 / -MMD main.cpp.d -MF main.cpp.od -MT main.cpp.o
-D_GNU_SOURCE -D_REENTRANT -D __USE_XOPEN2K8 -D _LARGEFILE_SOURCE -D _FILE_OFFSET_BITS = 64 -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -D NDEBUG $ / lab / main.cpp -quiet -dumpbase main.cpp -msse -mfpmath = sse -march = core2 -auxbase-strip main.cpp.o -g -O3 -Wall -Wextra -std = gnu ++ 1y - 版本-fdiagnostics-color = auto -ftemplate-depth = 128 -fno-operator-names -o /tmp/ccxfkRyd.s


无论它是否有效, cc1plus 的命令行参数完全相同。根本没有区别。这似乎没有什么帮助。

然而,不同之处在于链接时间。这是我看到的


$ / gcc-4.9.4 / collect2 -plugin $ / gcc-4.9.4 / liblto_plugin.so

-plugin-opt = $ / gcc-4.9.4 / lto-wrapper -plugin-opt = -fresolution = / tmp / cchl8RtI。 res -plugin-opt = -pass-through = -lgcc_s -plugin-opt = -pass-through = -lgcc -plugin-opt = -pass-through = -lpthread -plugin-opt = -pass-through = -lc - plugin-opt = -pass-through = -lgcc_s -plugin-opt = -pass-through = -lgcc -eh-frame-hdr -m elf_x86_64 -export-dynamic -dynamic-linker / lib64 / ld-linux-x86- 64.so.2 -o运行/usr/lib/../lib64/crt1.o
/usr/lib/../lib64/crti.o $ / gcc-4.9.4 / crtbegin.o - L / a / lib -L ​​/ b / lib
-L ​​/ c / lib
-lpthread - 需要的main.cpp.o -lboost_timer -lboost_wave -lboost_chrono -lboost_filesystem -lboost_graph -lboost_locale -lboost_thread -lboost_wserialization -lboost_atomic -lboost_context -lboost_date_time -lboost_iostreams -lboost_math_c99 -lboost_math_c99f -lboost_math_c99l -lboost_math_tr1 -lboost_m ath_tr1f -lboost_math_tr1l -lboost_mpi -lboost_prg_exec_monitor -lboost_program_options -lboost_random -lboost_regex -lboost_serialization -lboost_signals -lboost_system -lboost_unit_test_framework -lboost_exception -lboost_test_exec_monitor -lbz2 -licui18n -licuuc -licudata -lz -rpath / a / lib:/ b / lib:/ c / lib:-lstdc ++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc $ / gcc-4.9.4 / crtend.o / usr / lib /../ lib64 / crtn.o


正如您所看到的, -lpthread 被提及两次

STRONG>!第一个 -lpthread (后面跟着 - 根据需要缺失 针对发生段错误的情况。这是这两种情况之间的唯一区别。






nm -C 在这两种情况下



有趣的是, nm -C 在这两种情况下都是相同的(如果您忽略第一列中的整数值)。

  0000000000402580 d _DYNAMIC 
0000000000402798 d _GLOBAL_OFFSET_TABLE_
0000000000401000 t _GLOBAL__sub_I_main
0000000000401358 R _IO_stdin_used
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
U _Unwind_Resume
0000000000401150 W std :: _sp_counted_base<(__ gnu_cxx :: _ Lock_policy)2> :: _ M_destroy()
0000000000401170 W std :: vector< std :: shared_ptr< int>,std :: allocator< std :: shared_ptr< int> > > ::〜vector()
0000000000401170 W std :: vector< std :: shared_ptr< int>,std :: allocator< std :: shared_ptr< int> > > ::〜vector()
0000000000401250 W std :: vector< std :: unique_ptr< int,std :: default_delete< int> >,std :: allocator< std :: unique_ptr< int,std :: default_delete< int> > > > ::〜vector()
0000000000401250 W std :: vector< std :: unique_ptr< int,std :: default_delete< int> >,std :: allocator< std :: unique_ptr< int,std :: default_delete< int> > > > ::〜vector()
U std :: ios_base :: Init :: Init()
U std :: ios_base :: Init ::〜Init()
0000000000402880 B std :: cout
U std :: basic_ostream< char,std :: char_traits< char> >&安培; std :: endl< char,std :: char_traits< char> >(std :: basic_ostream< char,std :: char_traits< char>&)
0000000000402841 b std :: __ ioinit
U std :: basic_ostream< char,std :: char_traits< char> ; >&安培;的std ::运营商LT;< <的std :: char_traits<炭> >(std :: basic_ostream< char,std :: char_traits< char>>&; char const *)
U操作符删除(void *)
U操作符new(无符号长整数)
0000000000401510 r __FRAME_END__
0000000000402818 d __JCR_END__
0000000000402818 d __JCR_LIST__
0000000000402820 d __TMC_END__
0000000000402820 d __TMC_LIST__
0000000000402838 A __bss_start
U __cxa_atexit
0000000000402808 d __data_start
0000000000401100吨__do_global_dtors_aux
0000000000402820吨__do_global_dtors_aux_fini_array_entry
0000000000402810 d __dso_handle
0000000000402828吨__frame_dummy_init_array_entry
W __gmon_start__
ü__gxx_personality_v0
0000000000402838吨__init_array_end
0000000000402828 t __init_array_start
00000000004012b0 T __libc_csu_fini
00000000004012c0 T __libc_csu_init
U __libc_start_main
w __pthr ead_key_create
0000000000402838 A _edata
0000000000402990 A _end
000000000040134c T _fini
0000000000400e68 T _init
0000000000401028 T _start
0000000000401054 t call_gmon_start
0000000000402840 b完成.6661
0000000000402808 W data_start
0000000000401080 t deregister_tm_clones
0000000000401120 t frame_dummy
0000000000400f40 T main
00000000004010c0 t register_tm_clones
pre>

解决方案

鉴于崩溃的重点,以及事先预加载 libpthread 似乎解决它,我相信这两个案件的执行分歧在 locale_init.cc:315 。下面是代码的摘录:

  void 
locale :: _ S_initialize()
{$ b $ _ b #ifdef __GTHREADS
if(__gthread_active_p())
__gthread_once(& _S_once,_S_initialize_once);
#endif
if(!_S_classic)
_S_initialize_once();

__ gthread_active_p()如果您的程序与pthread链接,则返回true,具体来说,它会检查 pthread_key_create 是否可用。在我的系统中,这个符号在/usr/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h中定义为 static inline LD_PRELOAD = libpthread,所以将会是一个潜在的ODR违规源。



总是导致 __ gthread_active_p()返回true。

__ gthread_once 是另一个内联符号,它应该始终转发到 pthread_once



很难在没有调试的情况下猜测发生了什么,但是我怀疑你正在碰到 __ gthread_active_p()的真正分支,即使它不应该,程序也会崩溃,因为没有 pthread_once



编辑
所以我做了一些实验,这是我看到的唯一方法如果 __ gthread_active_p 返回true,但是 std :: locale :: _ S_initialize pthread_once 未链接。



libstdc + +不直接与 pthread 链接,但它将 pthread_xx 的一半导入为弱对象,这意味着它们可以是未定义的并且不会导致链接器错误。

很明显,链接pthread会使崩溃消失,但如果我是对的,主要问题是您的 libstdc ++ 认为它在一个多线程可执行文件中,即使我们没有链接pthread。



现在, __ gthread_active_p 使用 __ pthread_key_create 来决定我们是否有线程或不是。这在您的可执行文件中被定义为弱对象(可以是nullptr,但仍然可以)。由于 shared_ptr ,我肯定有99%的符号存在(删除它并再次检查 nm 以确保) 。
因此,无论如何, __ pthread_key_create 被绑定到一个有效的地址,可能是因为你的最后一个 -lpthread 链接器标志。
您可以通过在 locale_init.cc:315 处添加断点并检查您采用哪个分支来验证此理论。



EDIT2

评论摘要,如果我们拥有以下所有内容,问题只能重现:


  1. 使用 ld.gold 而不是 ld.bfd
  2. 使用 - 根据需要

  3. 强制定义 __ pthread_key_create ,在这种情况下通过实例化 std :: shared_ptr

  4. 不链接到 pthread ,或链接 pthread 之后 - as需要

要回答评论中的问题:


为什么默认使用黄金?


默认情况下,它使用 / usr / bin / ld ,这在大多数发行版中都是符号链接,可以是 /usr/bin/ld.bfd /usr/bin/ld.gold 。这样的默认值可以使用 update-alternatives 来操纵。我不确定为什么在你的情况下它是 ld.gold ,就我所知,RHEL5附带了 ld.bfd 为默认值。


为什么gold没有添加pthread.so依赖于二进制文件(如果需要的话)?

因为需要的定义有点阴暗。 $ b


p>

- 不需要

此选项会影响命令行上提到的动态库的ELF DT_NEEDED标记后面加上--as-needed选项。
通常,链接器将为命令行上提到的每个动态库添加一个DT_NEEDED
标记,无论该库是实际需要的还是不。
--as-needed会导致DT_NEEDED标记为
,仅当链接中的某个库满足来自常规$的非弱的未定义符号引用时, b $ b目标文件,或者如果该库不是在其他所需库的DT_NEEDED列表中找到
,则从另一个需要的动态
库中导入一个非弱的未定义符号引用。在问题库之后,在命令行上出现的对象文件或库
不会影响库是否被视为需要。这与用于从档案中提取
目标文件的规则类似,为
。 --no-as-needed恢复默认行为。现在,根据 org / bugzilla / show_bug.cgi?id = 16417rel =nofollow noreferrer>这个bug报告 gold 正在兑现非微弱未定义符号部分,而 ld.bfd 会根据需要看到弱符号。 TBH我对此没有充分的理解,并且关于这个链接是否被认为是 ld.gold bug或者是 libstdc ++ bug。


为什么我需要提及-pthread和-lpthread? (-pthread是我们的构建系统默认传递的
,并且我已经传递-lpthread来使
与gold一起工作)。

-pthread -lpthread 做不同的事情(参见 pthread vs lpthread )。这是我的理解,前者应该暗示后者。



无论如何,你可能只能传递一次 -lpthread ,但是您需要在 - 按需之前执行此操作,或者使用 - 不需要的在最后一个库之后和之前 -lpthread



值得一提的是,我无法在我的系统(GCC 7.2)上重现此问题,即使使用金链接器也是如此。
所以我怀疑它已经在更新的版本libstdc ++中修复了,这也可以解释为什么它使用系统标准库时不会出现段错误。


Code

Here is the program that gives the segfault.

#include <iostream>
#include <vector>
#include <memory>

int main() 
{
    std::cout << "Hello World" << std::endl;

    std::vector<std::shared_ptr<int>> y {};  

    std::cout << "Hello World" << std::endl;
}

Of course, there is absolutely nothing wrong in the program itself. The root cause of the segfault depends on the environment in which its built and ran.


Background

We, at Amazon, use a build system which builds and deploys the binaries (lib and bin) in an almost machine independent way. For our case, that basically means it deploys the executable (built from the above program) into $project_dir/build/bin/ and almost all its dependencies (i.e the shared libraries) into $project_dir/build/lib/. Why I used the phrase "almost" is because for shared libraries such libc.so, libm.so, ld-linux-x86-64.so.2 and possibly few others, the executable picks from the system (i.e from /lib64 ). Note that it is supposed to pick libstdc++ from $project_dir/build/lib though.

Now I run it as follows:

$ LD_LIBRARY_PATH=$project_dir/build/lib ./build/bin/run

segmentation fault

However if I run it, without setting the LD_LIBRARY_PATH. It runs fine.


Diagnostics

1. ldd

Here are ldd informations for both cases (please note that I've edited the output to mention the full version of the libraries wherever there is difference)

$ LD_LIBRARY_PATH=$project_dir/build/lib ldd ./build/bin/run

linux-vdso.so.1 =>  (0x00007ffce19ca000)
libstdc++.so.6 => $project_dir/build/lib/libstdc++.so.6.0.20 
libgcc_s.so.1 =>  $project_dir/build/lib/libgcc_s.so.1 
libc.so.6 => /lib64/libc.so.6 
libm.so.6 => /lib64/libm.so.6 
/lib64/ld-linux-x86-64.so.2 (0x0000562ec51bc000)

and without LD_LIBRARY_PATH:

$ ldd ./build/bin/run

linux-vdso.so.1 =>  (0x00007fffcedde000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6.0.16 
libgcc_s.so.1 => /lib64/libgcc_s-4.4.6-20110824.so.1
libc.so.6 => /lib64/libc.so.6 
libm.so.6 => /lib64/libm.so.6 
/lib64/ld-linux-x86-64.so.2 (0x0000560caff38000)

2. gdb when it segfaults

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.62.al12.x86_64
(gdb) bt
#0  0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1  0x00007ffff7df0c55 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2
#2  0x00007ffff7b1dc41 in std::locale::_S_initialize() () from $project_dir/build/lib/libstdc++.so.6
#3  0x00007ffff7b1dc85 in std::locale::locale() () from $project_dir/build/lib/libstdc++.so.6
#4  0x00007ffff7b1a574 in std::ios_base::Init::Init() () from $project_dir/build/lib/libstdc++.so.6
#5  0x0000000000400fde in _GLOBAL__sub_I_main () at $project_dir/build/gcc-4.9.4/include/c++/4.9.4/iostream:74
#6  0x00000000004012ed in __libc_csu_init ()
#7  0x00007ffff7518cb0 in __libc_start_main () from /lib64/libc.so.6
#8  0x0000000000401021 in _start ()
(gdb)

3. LD_DEBUG=all

I also tried to see the linker information by enabling LD_DEBUG=all for the segfault case. I found something suspicious, as it searches for pthread_once symbol, and when it unable to find this, it gives segfault (that is my interpretation of the following output snippet BTW):

initialize program: $project_dir/build/bin/run

symbol=_ZNSt8ios_base4InitC1Ev;  lookup in file=$project_dir/build/bin/run [0]
symbol=_ZNSt8ios_base4InitC1Ev;  lookup in file=$project_dir/build/lib/libstdc++.so.6 [0]
binding file $project_dir/build/bin/run [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base4InitC1Ev' [GLIBCXX_3.4]
symbol=_ZNSt6localeC1Ev;  lookup in file=$project_dir/build/bin/run [0]
symbol=_ZNSt6localeC1Ev;  lookup in file=$project_dir/build/lib/libstdc++.so.6 [0]
binding file $project_dir/build/lib/libstdc++.so.6 [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt6localeC1Ev' [GLIBCXX_3.4]
symbol=pthread_once;  lookup in file=$project_dir/build/bin/run [0]
symbol=pthread_once;  lookup in file=$project_dir/build/lib/libstdc++.so.6 [0]
symbol=pthread_once;  lookup in file=$project_dir/build/lib/libgcc_s.so.1 [0]
symbol=pthread_once;  lookup in file=/lib64/libc.so.6 [0]
symbol=pthread_once;  lookup in file=/lib64/libm.so.6 [0]
symbol=pthread_once;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]

But I dont see any pthread_once for the case when it runs successfully!


Questions

I know that its very difficult to debug like this and probably I've not given a lot of informations about the environments and all. But still, my question is: what could be the possible root-cause for this segfault? How to debug further and find that? Once I find the issue, fix would be easy.


Compiler and Platform

I'm using GCC 4.9 on RHEL5.


Experiments

E#1

If I comment the following line:

std::vector<std::shared_ptr<int>> y {}; 

It compiles and runs fine!

E#2

I just included the following header to my program:

#include <boost/filesystem.hpp>

and linked accordingly. Now it works without any segfault. So it seems by having a dependency on libboost_system.so.1.53.0., some requirements are met, or the problem is circumvented!

E#3

Since I saw it working when I made the executable to be linked against libboost_system.so.1.53.0, so I did the following things step by step.

Instead of using #include <boost/filesystem.hpp> in the code itself, I use the original code and ran it by preloading libboost_system.so using LD_PRELOAD as follows:

$ LD_PRELOAD=$project_dir/build/lib/libboost_system.so $project_dir/build/bin/run

and it ran successfully!

Next I did ldd on the libboost_system.so which gave a list of libs, two of which were:

  /lib64/librt.so.1
  /lib64/libpthread.so.0

So instead of preloading libboost_system, I preload librt and libpthread separately:

$ LD_PRELOAD=/lib64/librt.so.1 $project_dir/build/bin/run

$ LD_PRELOAD=/lib64/libpthread.so.0 $project_dir/build/bin/run

In both cases, it ran successfully.

Now my conclusion is that by loading either librt or libpthread (or both ), some requirements are met or the problem is circumvented! I still dont know the root cause of the issue, though.


Compilation and Linking Options

Since the build system is complex and there are lots of options which are there by default. So I tried to explicitly add -lpthread using CMake's set command, then it worked, as we have already seen that by preloading libpthread it works!

In order to see the build difference between these two cases (when-it-works and when-it-gives-segfault), I built it in verbose mode by passing -v to GCC, to see the compilation stages and the options it actually passes to cc1plus (compiler) and collect2 (linker).

(Note that paths has been edited for brevity, using dollar-sign and dummy paths.)

$/gcc-4.9.4/cc1plus -quiet -v -I /a/include -I /b/include -iprefix $/gcc-4.9.4/ -MMD main.cpp.d -MF main.cpp.o.d -MT main.cpp.o -D_GNU_SOURCE -D_REENTRANT -D __USE_XOPEN2K8 -D _LARGEFILE_SOURCE -D _FILE_OFFSET_BITS=64 -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -D NDEBUG $/lab/main.cpp -quiet -dumpbase main.cpp -msse -mfpmath=sse -march=core2 -auxbase-strip main.cpp.o -g -O3 -Wall -Wextra -std=gnu++1y -version -fdiagnostics-color=auto -ftemplate-depth=128 -fno-operator-names -o /tmp/ccxfkRyd.s

Irrespective of whether it works or not, the command-line arguments to cc1plus are exactly the same. No difference at all. That does not seem to be very helpful.

The difference, however, is at the linking time. Here is what I see, for the case when it works:

$/gcc-4.9.4/collect2 -plugin $/gcc-4.9.4/liblto_plugin.so
-plugin-opt=$/gcc-4.9.4/lto-wrapper -plugin-opt=-fresolution=/tmp/cchl8RtI.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lpthread -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --eh-frame-hdr -m elf_x86_64 -export-dynamic -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o run /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o $/gcc-4.9.4/crtbegin.o -L/a/lib -L/b/lib -L/c/lib -lpthread --as-needed main.cpp.o -lboost_timer -lboost_wave -lboost_chrono -lboost_filesystem -lboost_graph -lboost_locale -lboost_thread -lboost_wserialization -lboost_atomic -lboost_context -lboost_date_time -lboost_iostreams -lboost_math_c99 -lboost_math_c99f -lboost_math_c99l -lboost_math_tr1 -lboost_math_tr1f -lboost_math_tr1l -lboost_mpi -lboost_prg_exec_monitor -lboost_program_options -lboost_random -lboost_regex -lboost_serialization -lboost_signals -lboost_system -lboost_unit_test_framework -lboost_exception -lboost_test_exec_monitor -lbz2 -licui18n -licuuc -licudata -lz -rpath /a/lib:/b/lib:/c/lib: -lstdc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc $/gcc-4.9.4/crtend.o /usr/lib/../lib64/crtn.o

As you can see, -lpthread is mentioned twice! The first -lpthread (which is followed by --as-needed) is missing for the case when it gives segfault. That is the only difference between these two cases.


Output of nm -C in both cases

Interestingly, the output of nm -C in both cases is identical (if you ignore the integer values in the first columns).

0000000000402580 d _DYNAMIC
0000000000402798 d _GLOBAL_OFFSET_TABLE_
0000000000401000 t _GLOBAL__sub_I_main
0000000000401358 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w _Jv_RegisterClasses
                 U _Unwind_Resume
0000000000401150 W std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy()
0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector()
0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector()
0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector()
0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector()
                 U std::ios_base::Init::Init()
                 U std::ios_base::Init::~Init()
0000000000402880 B std::cout
                 U std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
0000000000402841 b std::__ioinit
                 U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
                 U operator delete(void*)
                 U operator new(unsigned long)
0000000000401510 r __FRAME_END__
0000000000402818 d __JCR_END__
0000000000402818 d __JCR_LIST__
0000000000402820 d __TMC_END__
0000000000402820 d __TMC_LIST__
0000000000402838 A __bss_start
                 U __cxa_atexit
0000000000402808 D __data_start
0000000000401100 t __do_global_dtors_aux
0000000000402820 t __do_global_dtors_aux_fini_array_entry
0000000000402810 d __dso_handle
0000000000402828 t __frame_dummy_init_array_entry
                 w __gmon_start__
                 U __gxx_personality_v0
0000000000402838 t __init_array_end
0000000000402828 t __init_array_start
00000000004012b0 T __libc_csu_fini
00000000004012c0 T __libc_csu_init
                 U __libc_start_main
                 w __pthread_key_create
0000000000402838 A _edata
0000000000402990 A _end
000000000040134c T _fini
0000000000400e68 T _init
0000000000401028 T _start
0000000000401054 t call_gmon_start
0000000000402840 b completed.6661
0000000000402808 W data_start
0000000000401080 t deregister_tm_clones
0000000000401120 t frame_dummy
0000000000400f40 T main
00000000004010c0 t register_tm_clones

解决方案

Given the point of crash, and the fact that preloading libpthread seems to fix it, I believe that the execution of the two cases diverges at locale_init.cc:315. Here is an extract of the code:

  void
  locale::_S_initialize()
  {
#ifdef __GTHREADS
    if (__gthread_active_p())
      __gthread_once(&_S_once, _S_initialize_once);
#endif
    if (!_S_classic)
      _S_initialize_once();
  }

__gthread_active_p() returns true if your program is linked against pthread, specifically it checks if pthread_key_create is available. On my system, this symbol is defined in "/usr/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h" as static inline, hence it is a potential source of ODR violation.

Notice that LD_PRELOAD=libpthread,so will always cause __gthread_active_p() to return true.

__gthread_once is another inlined symbol that should always forward to pthread_once.

It's hard to guess what's going on without debugging, but I suspect that you are hitting the true branch of __gthread_active_p() even when it shouldn't, and the program then crashes because there is no pthread_once to call.

EDIT: So I did some experiments, the only way I see to get a crash in std::locale::_S_initialize is if __gthread_active_p returns true, but pthread_once is not linked in.

libstdc++ does not link directly against pthread, but it imports half of pthread_xx as weak objects, which means they can be undefined and not cause a linker error.

Obviously linking pthread will make the crash disappear, but if I am right, the main issue is that your libstdc++ thinks that it is inside a multi-threaded executable even if we did not link pthread in.

Now, __gthread_active_p uses __pthread_key_create to decide if we have threads or no. This is defined in your executable as a weak object (can be nullptr and still be fine). I am 99% sure that the symbol is there because of shared_ptr (remove it and check nm again to be sure). So, somehow __pthread_key_create gets bound to a valid address, maybe because of that last -lpthread in your linker flags. You can verify this theory by putting a breakpoint at locale_init.cc:315 and checking which branch you take.

EDIT2:

Summary of the comments, the issue is only reproducible if we have all of the following:

  1. Use ld.gold instead of ld.bfd
  2. Use --as-needed
  3. Forcing a weak definition of __pthread_key_create, in this case via instantiation of std::shared_ptr.
  4. Not linking to pthread, or linking pthread after --as-needed.

To answer the questions in the comments:

Why does it use gold by default?

By default it uses /usr/bin/ld, which on most distro is a symlink to either /usr/bin/ld.bfd or /usr/bin/ld.gold. Such default can be manipulated using update-alternatives. I am not sure why in your case it is ld.gold, as far as I understand RHEL5 ships with ld.bfd as default.

And why does gold not add pthread.so dependency to the binary if it is needed?

Because the definition of what is needed is somehow shady. man ld says (emphasis mine):

--as-needed

--no-as-needed

This option affects ELF DT_NEEDED tags for dynamic libraries mentioned on the command line after the --as-needed option. Normally the linker will add a DT_NEEDED tag for each dynamic library mentioned on the command line, regardless of whether the library is actually needed or not. --as-needed causes a DT_NEEDED tag to only be emitted for a library that at that point in the link satisfies a non-weak undefined symbol reference from a regular object file or, if the library is not found in the DT_NEEDED lists of other needed libraries, a non-weak undefined symbol reference from another needed dynamic library. Object files or libraries appearing on the command line after the library in question do not affect whether the library is seen as needed. This is similar to the rules for extraction of object files from archives. --no-as-needed restores the default behaviour.

Now, according to this bug report, gold is honoring the "non weak undefined symbol" part, while ld.bfd sees weak symbols as needed. TBH I do not have a full understanding on this, and there is some discussion on that link as to whether this is to be considered a ld.gold bug, or a libstdc++ bug.

Why do I need to mention -pthread and -lpthread both? (-pthread is passed by default by our build system, and I've pass -lpthread to make it work with gold is used).

-pthread and -lpthread do different things (see pthread vs lpthread). It is my understanding that the former should imply the latter.

Regardless, you can probably pass -lpthread only once, but you need to do it before --as-needed, or use --no-as-needed after the last library and before -lpthread.

It is also worth mentioning that I was not able to reproduce this issue on my system (GCC 7.2), even using the gold linker. So I suspect that it has been fixed in a more recent version libstdc++, which might also explain why it does not segfault if you use the system standard library.

这篇关于Segfault声明一个类型为vector的变量&lt; shared_ptr&lt; int&gt;&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆