Segfault声明一个类型为vector的变量< shared_ptr< int>> [英] Segfault on declaring a variable of type vector<shared_ptr<int>>
问题描述
代码
这是给出段错误的程序。
#include< iostream>
#include< vector>
#include< memory>
int main()
{
std :: cout<< Hello World<<的std :: ENDL;
std :: vector< std :: shared_ptr< int>> y {};
std :: cout<< Hello World<<的std :: ENDL;
}
当然,绝对没有错程序本身。 segfault的根本原因取决于其构建和运行的环境。
背景
我们在亚马逊使用build系统在几乎独立于机器的独立环境中构建和部署二进制文件( lib
和 bin
)办法。对于我们的情况,这基本上意味着它将可执行文件(由上述程序构建)部署到 $ project_dir / build / bin /
和几乎所有的依赖关系(即共享库)放入 $ project_dir / build / lib /
中。为什么我使用几乎这个短语是因为对于共享库,例如 libc.so
, libm.so
, ld-linux-x86-64.so.2
,可能还有其他几个,可执行文件从系统中选取(即从 / lib64
)。注意,假设可以从 $ project_dir / build / lib
中选择 libstdc ++
。
现在我按如下方式运行它:
$ LD_LIBRARY_PATH = $ project_dir / build / lib ./build/bin/run
分段错误
但是,如果我运行它,而不设置 LD_LIBRARY_PATH
。它运行良好。
诊断
ldd
以下是两种情况下的 ldd
信息(请注意,我编辑了输出以提及 full 版本的库不管有什么区别)
$ LD_LIBRARY_PATH = $ project_dir / build / lib ldd ./build/bin/run
linux-vdso.so.1 => (0x00007ffce19ca000)
libstdc ++。so.6 => $ project_dir / build / lib / libstdc ++。so.6.0.20
libgcc_s.so.1 => $ project_dir / build / lib / libgcc_s.so.1
libc.so.6 => /lib64/libc.so.6
libm.so.6 => /lib64/libm.so.6
/lib64/ld-linux-x86-64.so.2(0x0000562ec51bc000)
且不含LD_LIBRARY_PATH:
$ ldd ./build/bin/run
linux-vdso.so.1 => (0x00007fffcedde000)
libstdc ++。so.6 => /usr/lib64/libstdc++.so.6.0.16
libgcc_s.so.1 => /lib64/libgcc_s-4.4.6-20110824.so.1
libc.so.6 => /lib64/libc.so.6
libm.so.6 => /lib64/libm.so.6
/lib64/ld-linux-x86-64.so.2(0x0000560caff38000)
2。 gdb当它发生故障时
编程接收到的信号SIGSEGV,分段故障。
0x00007ffff7dea45c来自/lib64/ld-linux-x86-64.so.2 $ b $的_dl_fixup()缺少单独的debuginfos,请使用:debuginfo-install glibc-2.12-1.209.62.al12.x86_64
(gdb)bt
#0 0x00007ffff7dea45c来自/lib64/ld-linux-x86-64.so.2中的_dl_fixup()
#1 0x00007ffff7df0c55来自/ lib64 / ld-linux的_dl_runtime_resolve() -x86-64.so.2
#2 0x00007ffff7b1dc41在std :: locale :: _ S_initialize()()从$ project_dir / build / lib / libstdc ++。so.6
#3 0x00007ffff7b1dc85 std: :locale :: locale()()from $ project_dir / build / lib / libstdc ++。so.6
#4 0x00007ffff7b1a574 in std :: ios_base :: Init :: Init()()from $ project_dir / build / lib / libstdc ++。so.6
#5 0x0000000000400fde in _GLOBAL__sub_I_main()at $ project_dir / build / gcc-4.9.4 / include / c ++ / 4.9.4 / iostream:74
#6 0x_00000000004012ed __libc_csu_init ()
#7 0x00007ffff7518cb0 in __libc_start_main()from /lib64/libc.so.6
#8 0x0000000000401021 in _start()
(gdb)
3。 LD_DEBUG = all
我也尝试通过为段错误情况启用 LD_DEBUG = all
来查看链接器信息。我发现了一些可疑的东西,因为它搜索 pthread_once
符号,当它找不到它时,它会给出段错误(这是我对以下输出代码片段的解释):
初始化程序:$ project_dir / build / bin / run
symbol = _ZNSt8ios_base4InitC1Ev;在文件中查找= $ project_dir / build / bin / run [0]
symbol = _ZNSt8ios_base4InitC1Ev;查找文件= $ project_dir / build / lib / libstdc ++。so.6 [0]
绑定文件$ project_dir / build / bin / run [0] to $ project_dir / build / lib / libstdc ++。so.6 [ 0]:普通符号`_ZNSt8ios_base4InitC1Ev'[GLIBCXX_3.4]
symbol = _ZNSt6localeC1Ev;在文件中查找= $ project_dir / build / bin / run [0]
symbol = _ZNSt6localeC1Ev;查找文件= $ project_dir / build / lib / libstdc ++。so.6 [0]
绑定文件$ project_dir / build / lib / libstdc ++。so.6 [0]到$ project_dir / build / lib / libstdc ++。 so.6 [0]:普通符号`_ZNSt6localeC1Ev'[GLIBCXX_3.4]
symbol = pthread_once;在文件中查找= $ project_dir / build / bin / run [0]
symbol = pthread_once;在文件中查找= $ project_dir / build / lib / libstdc ++。so.6 [0]
symbol = pthread_once;在文件中查找= $ project_dir / build / lib / libgcc_s.so.1
symbol = pthread_once;在file = / lib64 / libc.so.6 [0]中查找
symbol = pthread_once;在file = / lib64 / libm.so.6 [0]中查找
symbol = pthread_once; lookup in file = / lib64 / ld-linux-x86-64.so.2 [0]
但是当它运行成功时,我没有看到 pthread_once
!
问题
我知道这样很难调试,可能我没有提供关于环境和所有信息的很多信息。但是,我的问题仍然是:这个段错误可能是什么原因?如何进一步调试并找到?一旦我发现问题,修复会很容易。
编译器和平台
我在RHEL5上使用 GCC 4.9 。
实验
E#1
如果我评论以下行:
std :: vector< std :: shared_ptr< ; INT>> y {};
编译并运行正常!
E#2
我在程序中加入了以下标题:
#include< boost / filesystem.hpp>
并相应链接。现在它没有任何段错误。因此,似乎通过依赖 libboost_system.so.1.53.0。
,满足了一些要求,或者避免了问题!
E#3
因为当我将可执行文件链接到 libboost_system.so.1.53时, 0
,所以我一步步做了以下事情。
不使用 #include< boost / filesystem .hpp>
在代码本身中,我使用原始代码并通过使用 LD_PRELOAD 预加载
libboost_system.so
/ code>如下:
$ LD_PRELOAD = $ project_dir / build / lib / libboost_system.so $ project_dir / build / bin / run
并成功运行!
接下来,我在 libboost_system.so
上做了 ldd
它提供了一个库的列表,其中两个是:
$ p $ /lib64/librt.so.1
/ lib64 /libpthread.so.0
因此,不要预加载 libboost_system
librt 和 libpthread
$ LD_PRELOAD = / lib64 / librt.so.1 $ project_dir / build / bin / run
$ LD_PRELOAD = / lib64 / libpthread.so.0 $ project_dir / build / bin / run
在这两种情况下,它都能成功运行。现在我的结论是,通过加载 由于构建系统非常复杂,并且有很多选项默认存在。所以我试图使用CMake的 (请注意,为简洁起见,使用美元符号和虚拟路径编辑路径。) librt
或 libpthread $ c $ (或两者),就会遇到一些要求或者规避问题!尽管如此,我仍然不知道问题的根源。
编译和链接选项
set
命令显式地添加 -lpthread
,然后它就起作用了,正如我们已经看到的那样预加载 libpthread
它的工作原理!
$ b 为了看到 build 这两种情况之间的区别( when-it-works 和 when-it-given-segfault ),我将它构建在 verbose 模式,将 -v
传递给GCC,以查看编译阶段和它实际传递给 cc1plus
(编译器)的选项。和 collect2
(链接器)。
$ b
$ / gcc-4.9.4 / cc1plus -quiet -v -I / a / include -I / b / include -iprefix
$ / gcc-4.9。 4 / -MMD main.cpp.d -MF main.cpp.od -MT main.cpp.o
-D_GNU_SOURCE -D_REENTRANT -D __USE_XOPEN2K8 -D _LARGEFILE_SOURCE -D _FILE_OFFSET_BITS = 64 -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -D NDEBUG $ / lab / main.cpp -quiet -dumpbase main.cpp -msse -mfpmath = sse -march = core2 -auxbase-strip main.cpp.o -g -O3 -Wall -Wextra -std = gnu ++ 1y - 版本-fdiagnostics-color = auto -ftemplate-depth = 128 -fno-operator-names -o /tmp/ccxfkRyd.s
无论它是否有效, cc1plus
的命令行参数完全相同。根本没有区别。这似乎没有什么帮助。
然而,不同之处在于链接时间。这是我看到的 :
$ / gcc-4.9.4 / collect2 -plugin $ / gcc-4.9.4 / liblto_plugin.so
-plugin-opt = $ / gcc-4.9.4 / lto-wrapper -plugin-opt = -fresolution = / tmp / cchl8RtI。 res -plugin-opt = -pass-through = -lgcc_s -plugin-opt = -pass-through = -lgcc -plugin-opt = -pass-through = -lpthread -plugin-opt = -pass-through = -lc - plugin-opt = -pass-through = -lgcc_s -plugin-opt = -pass-through = -lgcc -eh-frame-hdr -m elf_x86_64 -export-dynamic -dynamic-linker / lib64 / ld-linux-x86- 64.so.2 -o运行/usr/lib/../lib64/crt1.o
/usr/lib/../lib64/crti.o $ / gcc-4.9.4 / crtbegin.o - L / a / lib -L / b / lib
-L / c / lib
-lpthread - 需要的main.cpp.o -lboost_timer -lboost_wave -lboost_chrono -lboost_filesystem -lboost_graph -lboost_locale -lboost_thread -lboost_wserialization -lboost_atomic -lboost_context -lboost_date_time -lboost_iostreams -lboost_math_c99 -lboost_math_c99f -lboost_math_c99l -lboost_math_tr1 -lboost_m ath_tr1f -lboost_math_tr1l -lboost_mpi -lboost_prg_exec_monitor -lboost_program_options -lboost_random -lboost_regex -lboost_serialization -lboost_signals -lboost_system -lboost_unit_test_framework -lboost_exception -lboost_test_exec_monitor -lbz2 -licui18n -licuuc -licudata -lz -rpath / a / lib:/ b / lib:/ c / lib:-lstdc ++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc $ / gcc-4.9.4 / crtend.o / usr / lib /../ lib64 / crtn.o
正如您所看到的, -lpthread
被提及两次
-lpthread
(后面跟着 - 根据需要
)缺失 针对发生段错误的情况。这是这两种情况之间的唯一区别。 nm -C
在这两种情况下
有趣的是, nm -C
在这两种情况下都是相同的(如果您忽略第一列中的整数值)。
0000000000402580 d _DYNAMIC
pre>
0000000000402798 d _GLOBAL_OFFSET_TABLE_
0000000000401000 t _GLOBAL__sub_I_main
0000000000401358 R _IO_stdin_used
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
U _Unwind_Resume
0000000000401150 W std :: _sp_counted_base<(__ gnu_cxx :: _ Lock_policy)2> :: _ M_destroy()
0000000000401170 W std :: vector< std :: shared_ptr< int>,std :: allocator< std :: shared_ptr< int> > > ::〜vector()
0000000000401170 W std :: vector< std :: shared_ptr< int>,std :: allocator< std :: shared_ptr< int> > > ::〜vector()
0000000000401250 W std :: vector< std :: unique_ptr< int,std :: default_delete< int> >,std :: allocator< std :: unique_ptr< int,std :: default_delete< int> > > > ::〜vector()
0000000000401250 W std :: vector< std :: unique_ptr< int,std :: default_delete< int> >,std :: allocator< std :: unique_ptr< int,std :: default_delete< int> > > > ::〜vector()
U std :: ios_base :: Init :: Init()
U std :: ios_base :: Init ::〜Init()
0000000000402880 B std :: cout
U std :: basic_ostream< char,std :: char_traits< char> >&安培; std :: endl< char,std :: char_traits< char> >(std :: basic_ostream< char,std :: char_traits< char>&)
0000000000402841 b std :: __ ioinit
U std :: basic_ostream< char,std :: char_traits< char> ; >&安培;的std ::运营商LT;< <的std :: char_traits<炭> >(std :: basic_ostream< char,std :: char_traits< char>>&; char const *)
U操作符删除(void *)
U操作符new(无符号长整数)
0000000000401510 r __FRAME_END__
0000000000402818 d __JCR_END__
0000000000402818 d __JCR_LIST__
0000000000402820 d __TMC_END__
0000000000402820 d __TMC_LIST__
0000000000402838 A __bss_start
U __cxa_atexit
0000000000402808 d __data_start
0000000000401100吨__do_global_dtors_aux
0000000000402820吨__do_global_dtors_aux_fini_array_entry
0000000000402810 d __dso_handle
0000000000402828吨__frame_dummy_init_array_entry
W __gmon_start__
ü__gxx_personality_v0
0000000000402838吨__init_array_end
0000000000402828 t __init_array_start
00000000004012b0 T __libc_csu_fini
00000000004012c0 T __libc_csu_init
U __libc_start_main
w __pthr ead_key_create
0000000000402838 A _edata
0000000000402990 A _end
000000000040134c T _fini
0000000000400e68 T _init
0000000000401028 T _start
0000000000401054 t call_gmon_start
0000000000402840 b完成.6661
0000000000402808 W data_start
0000000000401080 t deregister_tm_clones
0000000000401120 t frame_dummy
0000000000400f40 T main
00000000004010c0 t register_tm_clones
解决方案鉴于崩溃的重点,以及事先预加载
libpthread
似乎解决它,我相信这两个案件的执行分歧在locale_init.cc:315
。下面是代码的摘录:
void
locale :: _ S_initialize()
{$ b $ _ b #ifdef __GTHREADS
if(__gthread_active_p())
__gthread_once(& _S_once,_S_initialize_once);
#endif
if(!_S_classic)
_S_initialize_once();
__ gthread_active_p()
如果您的程序与pthread链接,则返回true,具体来说,它会检查pthread_key_create
是否可用。在我的系统中,这个符号在/usr/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h中定义为static inline
LD_PRELOAD = libpthread,所以将会是一个潜在的ODR违规源。
总是导致
__ gthread_active_p()
返回true。
__ gthread_once
是另一个内联符号,它应该始终转发到pthread_once
。
很难在没有调试的情况下猜测发生了什么,但是我怀疑你正在碰到
__ gthread_active_p()
的真正分支,即使它不应该,程序也会崩溃,因为没有pthread_once
编辑:
所以我做了一些实验,这是我看到的唯一方法如果__ gthread_active_p
返回true,但是在
未链接。std :: locale :: _ S_initialize
pthread_once
libstdc + +不直接与
pthread
链接,但它将pthread_xx
的一半导入为弱对象,这意味着它们可以是未定义的并且不会导致链接器错误。
很明显,链接pthread会使崩溃消失,但如果我是对的,主要问题是您的
libstdc ++ 认为它在一个多线程可执行文件中,即使我们没有链接pthread。
现在,
__ gthread_active_p
使用__ pthread_key_create
来决定我们是否有线程或不是。这在您的可执行文件中被定义为弱对象(可以是nullptr,但仍然可以)。由于shared_ptr
,我肯定有99%的符号存在(删除它并再次检查nm
以确保) 。
因此,无论如何,__ pthread_key_create
被绑定到一个有效的地址,可能是因为你的最后一个-lpthread
链接器标志。
您可以通过在locale_init.cc:315
处添加断点并检查您采用哪个分支来验证此理论。
EDIT2 :
评论摘要,如果我们拥有以下所有内容,问题只能重现:
- 使用
ld.gold
而不是ld.bfd $ c
- 使用
- 根据需要
- 强制定义
__ pthread_key_create
,在这种情况下通过实例化std :: shared_ptr
。
- 不链接到
pthread
,或链接pthread
之后- as需要
。
要回答评论中的问题:
为什么默认使用黄金?
默认情况下,它使用
/ usr / bin / ld
,这在大多数发行版中都是符号链接,可以是/usr/bin/ld.bfd
或/usr/bin/ld.gold
。这样的默认值可以使用update-alternatives
来操纵。我不确定为什么在你的情况下它是ld.gold
,就我所知,RHEL5附带了ld.bfd
为默认值。
为什么gold没有添加pthread.so依赖于二进制文件(如果需要的话)?
因为需要的定义有点阴暗。 $ b
p>
- 不需要
此选项会影响命令行上提到的动态库的ELF DT_NEEDED标记后面加上--as-needed选项。
通常,链接器将为命令行上提到的每个动态库添加一个DT_NEEDED
标记,无论该库是实际需要的还是不。
--as-needed会导致DT_NEEDED标记为
,仅当链接中的某个库满足来自常规$的非弱的未定义符号引用时, b $ b目标文件,或者如果该库不是在其他所需库的DT_NEEDED列表中找到
,则从另一个需要的动态
库中导入一个非弱的未定义符号引用。在问题库之后,在命令行上出现的对象文件或库
不会影响库是否被视为需要。这与用于从档案中提取
目标文件的规则类似,为
。 --no-as-needed恢复默认行为。现在,根据 org / bugzilla / show_bug.cgi?id = 16417rel =nofollow noreferrer>这个bug报告,gold
正在兑现非微弱未定义符号部分,而ld.bfd
会根据需要看到弱符号。 TBH我对此没有充分的理解,并且关于这个链接是否被认为是ld.gold
bug或者是libstdc ++
bug。
为什么我需要提及-pthread和-lpthread? (-pthread是我们的构建系统默认传递的
,并且我已经传递-lpthread来使
与gold一起工作)。
-pthread
和-lpthread
做不同的事情(参见 pthread vs lpthread )。这是我的理解,前者应该暗示后者。
无论如何,你可能只能传递一次
-lpthread
,但是您需要在- 按需
之前执行此操作,或者使用- 不需要的
在最后一个库之后和之前-lpthread
。
值得一提的是,我无法在我的系统(GCC 7.2)上重现此问题,即使使用金链接器也是如此。
所以我怀疑它已经在更新的版本libstdc ++中修复了,这也可以解释为什么它使用系统标准库时不会出现段错误。Code
Here is the program that gives the segfault.
#include <iostream> #include <vector> #include <memory> int main() { std::cout << "Hello World" << std::endl; std::vector<std::shared_ptr<int>> y {}; std::cout << "Hello World" << std::endl; }
Of course, there is absolutely nothing wrong in the program itself. The root cause of the segfault depends on the environment in which its built and ran.
Background
We, at Amazon, use a build system which builds and deploys the binaries (
lib
andbin
) in an almost machine independent way. For our case, that basically means it deploys the executable (built from the above program) into$project_dir/build/bin/
and almost all its dependencies (i.e the shared libraries) into$project_dir/build/lib/
. Why I used the phrase "almost" is because for shared libraries suchlibc.so
,libm.so
,ld-linux-x86-64.so.2
and possibly few others, the executable picks from the system (i.e from/lib64
). Note that it is supposed to picklibstdc++
from$project_dir/build/lib
though.Now I run it as follows:
$ LD_LIBRARY_PATH=$project_dir/build/lib ./build/bin/run segmentation fault
However if I run it, without setting the
LD_LIBRARY_PATH
. It runs fine.
Diagnostics
1. ldd
Here are
ldd
informations for both cases (please note that I've edited the output to mention the full version of the libraries wherever there is difference)$ LD_LIBRARY_PATH=$project_dir/build/lib ldd ./build/bin/run linux-vdso.so.1 => (0x00007ffce19ca000) libstdc++.so.6 => $project_dir/build/lib/libstdc++.so.6.0.20 libgcc_s.so.1 => $project_dir/build/lib/libgcc_s.so.1 libc.so.6 => /lib64/libc.so.6 libm.so.6 => /lib64/libm.so.6 /lib64/ld-linux-x86-64.so.2 (0x0000562ec51bc000)
and without LD_LIBRARY_PATH:
$ ldd ./build/bin/run linux-vdso.so.1 => (0x00007fffcedde000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6.0.16 libgcc_s.so.1 => /lib64/libgcc_s-4.4.6-20110824.so.1 libc.so.6 => /lib64/libc.so.6 libm.so.6 => /lib64/libm.so.6 /lib64/ld-linux-x86-64.so.2 (0x0000560caff38000)
2. gdb when it segfaults
Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.62.al12.x86_64 (gdb) bt #0 0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 #1 0x00007ffff7df0c55 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2 #2 0x00007ffff7b1dc41 in std::locale::_S_initialize() () from $project_dir/build/lib/libstdc++.so.6 #3 0x00007ffff7b1dc85 in std::locale::locale() () from $project_dir/build/lib/libstdc++.so.6 #4 0x00007ffff7b1a574 in std::ios_base::Init::Init() () from $project_dir/build/lib/libstdc++.so.6 #5 0x0000000000400fde in _GLOBAL__sub_I_main () at $project_dir/build/gcc-4.9.4/include/c++/4.9.4/iostream:74 #6 0x00000000004012ed in __libc_csu_init () #7 0x00007ffff7518cb0 in __libc_start_main () from /lib64/libc.so.6 #8 0x0000000000401021 in _start () (gdb)
3. LD_DEBUG=all
I also tried to see the linker information by enabling
LD_DEBUG=all
for the segfault case. I found something suspicious, as it searches forpthread_once
symbol, and when it unable to find this, it gives segfault (that is my interpretation of the following output snippet BTW):initialize program: $project_dir/build/bin/run symbol=_ZNSt8ios_base4InitC1Ev; lookup in file=$project_dir/build/bin/run [0] symbol=_ZNSt8ios_base4InitC1Ev; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0] binding file $project_dir/build/bin/run [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base4InitC1Ev' [GLIBCXX_3.4] symbol=_ZNSt6localeC1Ev; lookup in file=$project_dir/build/bin/run [0] symbol=_ZNSt6localeC1Ev; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0] binding file $project_dir/build/lib/libstdc++.so.6 [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt6localeC1Ev' [GLIBCXX_3.4] symbol=pthread_once; lookup in file=$project_dir/build/bin/run [0] symbol=pthread_once; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0] symbol=pthread_once; lookup in file=$project_dir/build/lib/libgcc_s.so.1 [0] symbol=pthread_once; lookup in file=/lib64/libc.so.6 [0] symbol=pthread_once; lookup in file=/lib64/libm.so.6 [0] symbol=pthread_once; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
But I dont see any
pthread_once
for the case when it runs successfully!
Questions
I know that its very difficult to debug like this and probably I've not given a lot of informations about the environments and all. But still, my question is: what could be the possible root-cause for this segfault? How to debug further and find that? Once I find the issue, fix would be easy.
Compiler and Platform
I'm using GCC 4.9 on RHEL5.
Experiments
E#1
If I comment the following line:
std::vector<std::shared_ptr<int>> y {};
It compiles and runs fine!
E#2
I just included the following header to my program:
#include <boost/filesystem.hpp>
and linked accordingly. Now it works without any segfault. So it seems by having a dependency on
libboost_system.so.1.53.0.
, some requirements are met, or the problem is circumvented!E#3
Since I saw it working when I made the executable to be linked against
libboost_system.so.1.53.0
, so I did the following things step by step.Instead of using
#include <boost/filesystem.hpp>
in the code itself, I use the original code and ran it by preloadinglibboost_system.so
usingLD_PRELOAD
as follows:$ LD_PRELOAD=$project_dir/build/lib/libboost_system.so $project_dir/build/bin/run
and it ran successfully!
Next I did
ldd
on thelibboost_system.so
which gave a list of libs, two of which were:/lib64/librt.so.1 /lib64/libpthread.so.0
So instead of preloading
libboost_system
, I preloadlibrt
andlibpthread
separately:$ LD_PRELOAD=/lib64/librt.so.1 $project_dir/build/bin/run $ LD_PRELOAD=/lib64/libpthread.so.0 $project_dir/build/bin/run
In both cases, it ran successfully.
Now my conclusion is that by loading either
librt
orlibpthread
(or both ), some requirements are met or the problem is circumvented! I still dont know the root cause of the issue, though.
Compilation and Linking Options
Since the build system is complex and there are lots of options which are there by default. So I tried to explicitly add
-lpthread
using CMake'sset
command, then it worked, as we have already seen that by preloadinglibpthread
it works!In order to see the build difference between these two cases (when-it-works and when-it-gives-segfault), I built it in verbose mode by passing
-v
to GCC, to see the compilation stages and the options it actually passes tocc1plus
(compiler) andcollect2
(linker).(Note that paths has been edited for brevity, using dollar-sign and dummy paths.)
$/gcc-4.9.4/cc1plus -quiet -v -I /a/include -I /b/include -iprefix $/gcc-4.9.4/ -MMD main.cpp.d -MF main.cpp.o.d -MT main.cpp.o -D_GNU_SOURCE -D_REENTRANT -D __USE_XOPEN2K8 -D _LARGEFILE_SOURCE -D _FILE_OFFSET_BITS=64 -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -D NDEBUG $/lab/main.cpp -quiet -dumpbase main.cpp -msse -mfpmath=sse -march=core2 -auxbase-strip main.cpp.o -g -O3 -Wall -Wextra -std=gnu++1y -version -fdiagnostics-color=auto -ftemplate-depth=128 -fno-operator-names -o /tmp/ccxfkRyd.s
Irrespective of whether it works or not, the command-line arguments to
cc1plus
are exactly the same. No difference at all. That does not seem to be very helpful.The difference, however, is at the linking time. Here is what I see, for the case when it works:
$/gcc-4.9.4/collect2 -plugin $/gcc-4.9.4/liblto_plugin.so
-plugin-opt=$/gcc-4.9.4/lto-wrapper -plugin-opt=-fresolution=/tmp/cchl8RtI.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lpthread -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --eh-frame-hdr -m elf_x86_64 -export-dynamic -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o run /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o $/gcc-4.9.4/crtbegin.o -L/a/lib -L/b/lib -L/c/lib -lpthread --as-needed main.cpp.o -lboost_timer -lboost_wave -lboost_chrono -lboost_filesystem -lboost_graph -lboost_locale -lboost_thread -lboost_wserialization -lboost_atomic -lboost_context -lboost_date_time -lboost_iostreams -lboost_math_c99 -lboost_math_c99f -lboost_math_c99l -lboost_math_tr1 -lboost_math_tr1f -lboost_math_tr1l -lboost_mpi -lboost_prg_exec_monitor -lboost_program_options -lboost_random -lboost_regex -lboost_serialization -lboost_signals -lboost_system -lboost_unit_test_framework -lboost_exception -lboost_test_exec_monitor -lbz2 -licui18n -licuuc -licudata -lz -rpath /a/lib:/b/lib:/c/lib: -lstdc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc $/gcc-4.9.4/crtend.o /usr/lib/../lib64/crtn.oAs you can see,
-lpthread
is mentioned twice! The first-lpthread
(which is followed by--as-needed
) is missing for the case when it gives segfault. That is the only difference between these two cases.
Output of
nm -C
in both casesInterestingly, the output of
nm -C
in both cases is identical (if you ignore the integer values in the first columns).0000000000402580 d _DYNAMIC 0000000000402798 d _GLOBAL_OFFSET_TABLE_ 0000000000401000 t _GLOBAL__sub_I_main 0000000000401358 R _IO_stdin_used w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable w _Jv_RegisterClasses U _Unwind_Resume 0000000000401150 W std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy() 0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector() 0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector() 0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector() 0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector() U std::ios_base::Init::Init() U std::ios_base::Init::~Init() 0000000000402880 B std::cout U std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) 0000000000402841 b std::__ioinit U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) U operator delete(void*) U operator new(unsigned long) 0000000000401510 r __FRAME_END__ 0000000000402818 d __JCR_END__ 0000000000402818 d __JCR_LIST__ 0000000000402820 d __TMC_END__ 0000000000402820 d __TMC_LIST__ 0000000000402838 A __bss_start U __cxa_atexit 0000000000402808 D __data_start 0000000000401100 t __do_global_dtors_aux 0000000000402820 t __do_global_dtors_aux_fini_array_entry 0000000000402810 d __dso_handle 0000000000402828 t __frame_dummy_init_array_entry w __gmon_start__ U __gxx_personality_v0 0000000000402838 t __init_array_end 0000000000402828 t __init_array_start 00000000004012b0 T __libc_csu_fini 00000000004012c0 T __libc_csu_init U __libc_start_main w __pthread_key_create 0000000000402838 A _edata 0000000000402990 A _end 000000000040134c T _fini 0000000000400e68 T _init 0000000000401028 T _start 0000000000401054 t call_gmon_start 0000000000402840 b completed.6661 0000000000402808 W data_start 0000000000401080 t deregister_tm_clones 0000000000401120 t frame_dummy 0000000000400f40 T main 00000000004010c0 t register_tm_clones
解决方案Given the point of crash, and the fact that preloading
libpthread
seems to fix it, I believe that the execution of the two cases diverges atlocale_init.cc:315
. Here is an extract of the code:void locale::_S_initialize() { #ifdef __GTHREADS if (__gthread_active_p()) __gthread_once(&_S_once, _S_initialize_once); #endif if (!_S_classic) _S_initialize_once(); }
__gthread_active_p()
returns true if your program is linked against pthread, specifically it checks ifpthread_key_create
is available. On my system, this symbol is defined in "/usr/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h" asstatic inline
, hence it is a potential source of ODR violation.Notice that
LD_PRELOAD=libpthread,so
will always cause__gthread_active_p()
to return true.
__gthread_once
is another inlined symbol that should always forward topthread_once
.It's hard to guess what's going on without debugging, but I suspect that you are hitting the true branch of
__gthread_active_p()
even when it shouldn't, and the program then crashes because there is nopthread_once
to call.EDIT: So I did some experiments, the only way I see to get a crash in
std::locale::_S_initialize
is if__gthread_active_p
returns true, butpthread_once
is not linked in.libstdc++ does not link directly against
pthread
, but it imports half ofpthread_xx
as weak objects, which means they can be undefined and not cause a linker error.Obviously linking pthread will make the crash disappear, but if I am right, the main issue is that your
libstdc++
thinks that it is inside a multi-threaded executable even if we did not link pthread in.Now,
__gthread_active_p
uses__pthread_key_create
to decide if we have threads or no. This is defined in your executable as a weak object (can be nullptr and still be fine). I am 99% sure that the symbol is there because ofshared_ptr
(remove it and checknm
again to be sure). So, somehow__pthread_key_create
gets bound to a valid address, maybe because of that last-lpthread
in your linker flags. You can verify this theory by putting a breakpoint atlocale_init.cc:315
and checking which branch you take.EDIT2:
Summary of the comments, the issue is only reproducible if we have all of the following:
- Use
ld.gold
instead ofld.bfd
- Use
--as-needed
- Forcing a weak definition of
__pthread_key_create
, in this case via instantiation ofstd::shared_ptr
.- Not linking to
pthread
, or linkingpthread
after--as-needed
.To answer the questions in the comments:
Why does it use gold by default?
By default it uses
/usr/bin/ld
, which on most distro is a symlink to either/usr/bin/ld.bfd
or/usr/bin/ld.gold
. Such default can be manipulated usingupdate-alternatives
. I am not sure why in your case it isld.gold
, as far as I understand RHEL5 ships withld.bfd
as default.And why does gold not add pthread.so dependency to the binary if it is needed?
Because the definition of what is needed is somehow shady.
man ld
says (emphasis mine):--as-needed
--no-as-needed
This option affects ELF DT_NEEDED tags for dynamic libraries mentioned on the command line after the --as-needed option. Normally the linker will add a DT_NEEDED tag for each dynamic library mentioned on the command line, regardless of whether the library is actually needed or not. --as-needed causes a DT_NEEDED tag to only be emitted for a library that at that point in the link satisfies a non-weak undefined symbol reference from a regular object file or, if the library is not found in the DT_NEEDED lists of other needed libraries, a non-weak undefined symbol reference from another needed dynamic library. Object files or libraries appearing on the command line after the library in question do not affect whether the library is seen as needed. This is similar to the rules for extraction of object files from archives. --no-as-needed restores the default behaviour.
Now, according to this bug report,
gold
is honoring the "non weak undefined symbol" part, whileld.bfd
sees weak symbols as needed. TBH I do not have a full understanding on this, and there is some discussion on that link as to whether this is to be considered ald.gold
bug, or alibstdc++
bug.Why do I need to mention -pthread and -lpthread both? (-pthread is passed by default by our build system, and I've pass -lpthread to make it work with gold is used).
-pthread
and-lpthread
do different things (see pthread vs lpthread). It is my understanding that the former should imply the latter.Regardless, you can probably pass
-lpthread
only once, but you need to do it before--as-needed
, or use--no-as-needed
after the last library and before-lpthread
.It is also worth mentioning that I was not able to reproduce this issue on my system (GCC 7.2), even using the gold linker. So I suspect that it has been fixed in a more recent version libstdc++, which might also explain why it does not segfault if you use the system standard library.
这篇关于Segfault声明一个类型为vector的变量< shared_ptr< int>>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!