在Linux上不会发生在Windows上的不可见的SIGSEGV? [英] Invisiable SIGSEGV on linux that does not happen on windows?

查看:127
本文介绍了在Linux上不会发生在Windows上的不可见的SIGSEGV?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介



我有一个支持插件的TCP / HTTP服务器以共享库的形式( DLL .so )。它有 make .sln 文件通过 premake 。当我开始我的应用程序时,我向它提供一个像这样的配置文件,其中描述了什么库服务器应该用作插件以及它将传递给tham的参数。有一段时间我有2个插件,都工作得很好。甚至现在工作得很好,如果我提供给我的服务器配置fdiles一样这个。但是现在我开发了一个新插件,所以新的配置文件



SETUP



linux是简单易用的


  • 下载构建脚本(来自此处,如此处所述

  • ./ cloud_server_net_setup.sh ,无需超级用户,需要curl,make和g ++
    In常规的情况下(不开发这是足够的 - 它会得到提升,并且它需要的其他库到本地文件夹中,它将以发布形式构建所有tham,构建服务器)
  • 现在可以cd进入 cloud_server / install-dir /

  • 调用 export LD_LIBRARY_PATH =。/:./ li b_boost

  • 并运行我们的服务器 ./ CloudServer






但是我们需要debug wersion,所以在我们调用脚本之后




  • cd cloud_server / CloudServer / projects / linux-gmake /

  • make

  • cd bin / debug
  • 导出LD_LIBRARY_PATH =。/ :(从我们的脚本中调用)/ cloud_server / install-dir / lib_boost



PROBLEM




  • 现在,最后我们可以调用gdb。
>

所以我们称之为。这就是我们所看到的:

$ g $ g $ g $ g $ g $ g $ g $ g $ g $ g $ g $ g $ g $ g $ G $ g $ g $ G $ g $ g $ g $ g $ 1-debian
版权所有(C)2009自由软件基金会,Inc.
许可证GPLv3 +:GNU GPL版本3或更高版本< http://gnu.org/licenses/gpl.html>
这是免费软件:您可以自由更改和重新分配它。
在法律允许的范围内,不存在任何担保。有关详细信息,请键入显示复制
和显示保修。
这个GDB被配置为x86_64-linux-gnu。
有关错误报告的说明,请参阅:
< http://www.gnu.org/software/gdb/bugs/> ...
从/ home / ole_jak中读取符号/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer...done。
(gdb)r
启动程序:/ home / ole_jak / cloud_server / CloudServer / projects / linux-gmake / bin / debug / CloudServer
[使用libthread_db启用线程调试]
Cloud Server v0.5
版权所有(c)2011 Cloud Forever。版权所有。

输入'help'查看帮助信息。
配置文件路径:config.xml
[新线程0x7ffff5967700(LWP 11516)]
[新线程0x7ffff5166700(LWP 11517)]
[新线程0x7ffff4965700(LWP 11518)]
[新线程0x7ffff4164700(LWP 11519)]
[新线程0x7ffff3963700(LWP 11520)]
[新线程0x7ffff3162700(LWP 11521)]
[新线程0x7ffff2961700(LWP 11522) )]
[新线程0x7ffff2160700(LWP 11523)]
[新线程0x7ffff195f700(LWP 11524)]
[新线程0x7ffff115e700(LWP 11525)]
[新线程0x7ffff095d700 LWP 11526)]
[新线程0x7fffebfff700(LWP 11527)]
[新线程0x7fffeb7fe700(LWP 11528)]
[新线程0x7fffeaffd700(LWP 11529)]
[New Thread 0x7fffea7fc700(LWP 11530)]
[新线程0x7fffe9ffb700(LWP 11531)]
库libFileService.so已打开。
[新线程0x7fffe953c700(LWP 11532)]
库libUsersFilesService.so已打开。

编程接收到的信号SIGSEGV,分段故障。
0x0000000000000000在?? ()
(gdb)x / i $ pc
0x0:无法访问地址0x0处的内存

我是Linux nube以及我所知道的关于分段错误,我从维基百科知道,但我知道关于我的服务器和这个新服务的更多信息 - 它在Windows上编译并运行,没有任何错误(VS2008,2010解决方案可以使用相同的预制脚本创建)。



所以我想知道在这2个文件中如何以及在哪里。cpp .h 我创建了一个错误,在L上表现如此激动人心inux的?并且是可以修复的,还是可以被新鲜的眼睛感染?
$ b

更新:
Valgrind输出 p>

  ole_jak @ dspproc:〜/ cloud_server / CloudServer / projects / linux-gmake / bin / debug $ valgrind ./CloudServer 
== 11682 == Memcheck,一个内存错误检测器
== 11682 == Julian Seward等人版权所有(C)2002-2010和GNU GPL'd。
== 11682 ==使用Valgrind-3.6.0.SVN-Debian和LibVEX;使用-h获取版权信息
== 11682 ==命令:./CloudServer
== 11682 ==
Cloud Server v0.5
版权所有(c)2011 Cloud Forever 。版权所有。

输入'help'查看帮助信息。
配置文件路径:config.xml
库libFileService.so已打开。
库libUsersFilesService.so已打开。
== 11682 ==跳转到下一行所述的无效地址
== 11682 ==在0x0:???
== 11682 == by 0x4D49BE:sqlite3_free(sqlite3.c:18155)
== 11682 == by 0x102242D5:sqlite3OsInit(sqlite3.c:14162)
== 11682 == by 0x1029EB28:sqlite3_initialize(sqlite3.c:107299)
== 11682 == by 0x102A159F:openDatabase(sqlite3.c:108909)
== 11682 == by 0x102A1B29:sqlite3_open(sqlite3.c:109156)
== 11682 == by 0x1021CAB0:sqlite3pp :: database :: connect(char const *)(sqlite3pp.cpp:89)
== 11682 == by 0x1021C6E3:sqlite3pp :: database :: database (char const *)(sqlite3pp.cpp:74)
== 11682 == by 0x1020DDDF:users_files_service :: create_files_table(std :: string)(users_files_service.cpp:171)
== 11682 == by 0x1020BAFC:users_files_service :: apply_config(boost :: shared_ptr< boost :: property_tree :: basic_ptree< std :: string,std :: string,std :: less< std :: string>>>)(users_files_service.cpp :38)
== 11682 == by 0x4B5432:server_utils :: parse_config_services(boost :: property_tree :: basic_ptree< std :: string,std :: string,std :: less< std: :字符串> >)(server_utils.cpp:156)
== 11682 == by 0x4B6436:server_utils :: parse_config(boost :: property_tree :: basic_ptree< std :: string,std :: string,std :: less< std :: string>>)(server_utils.cpp:208)
== 11682 ==地址0x0不是stack'd,malloc'd或(最近)free'd
== 11682 = =
== 11682 ==
== 11682 ==进程以信号11(SIGSEGV)的默认行为终止
== 11682 ==映射区域在地址0x0处的权限不正确
== 11682 ==在0x0:???
== 11682 == by 0x4D49BE:sqlite3_free(sqlite3.c:18155)
== 11682 == by 0x102242D5:sqlite3OsInit(sqlite3.c:14162)
== 11682 == by 0x1029EB28:sqlite3_initialize(sqlite3.c:107299)
== 11682 == by 0x102A159F:openDatabase(sqlite3.c:108909)
== 11682 == by 0x102A1B29:sqlite3_open(sqlite3.c:109156)
== 11682 == by 0x1021CAB0:sqlite3pp :: database :: connect(char const *)(sqlite3pp.cpp:89)
== 11682 == by 0x1021C6E3:sqlite3pp :: database :: database (char const *)(sqlite3pp.cpp:74)
== 11682 == by 0x1020DDDF:users_files_service :: create_files_table(std :: string)(users_files_service.cpp:171)
== 11682 == by 0x1020BAFC:users_files_service :: apply_config(boost :: shared_ptr< boost :: property_tree :: basic_ptree< std :: string,std :: string,std :: less< std :: string>>>)(users_files_service.cpp :38)
== 11682 == by 0x4B5432:server_utils :: parse_config_services(boost :: property_tree :: basic_ptree< std :: string,std :: string,std :: less< std: :字符串> >)(server_utils.cpp:156)
== 11682 == by 0x4B6436:server_utils :: parse_config(boost :: property_tree :: basic_ptree< std :: string,std :: string,std :: less< std :: string>>)(server_utils.cpp:208)
== 11682 ==
== 11682 == HEAP SUMMARY:
== 11682 ==退出时使用: 1,083个块中的124,050个字节
== 11682 ==总堆使用率:1,814个分配,731个释放,183,517个字节分配
== 11682 ==
== 11682 ==泄漏摘要:
== 11682 ==绝对丢失:0块中的0字节
== 11682 ==间接丢失:0块中的0字节
== 11682 ==可能丢失:799块中的46,248字节
== 11682 ==仍可达到:在284个块中有77,802个字节
== 11682 ==被抑制:0个字节在0个块中
== 11682 ==重新运行--leak-check = full查看泄漏内存的详细信息
== 11682 ==
== 11682 ==对于检测到并抑制的错误计数,请重新运行:-v
== 11682 ==错误摘要:1来自1个上下文的错误(被抑制:4从4)

ole_jak @ dspproc:〜/ cloud_server / CloudServer / projects / linux-gmake / bin / debug $


解决方案

这是一个令人讨厌的问题。我不确定确切的根本原因,但这似乎是一个多线程相关的问题。问题的直接原因是 c> sqlite3Config.m.xSize 函数指针在 NULL 错误发生。



这个指针应该初始化为指向第一次正确的函数 sqlite3_initialize() ,这通常会在您第一次打开SQLite数据库文件时发生。通过在GDB中设置断点和观察点,我能够验证指针是否成功设置,但在分段错误发生时,它的值是 NULL



这可能意味着以下两种情况之一:


  • 新的指针值不正确传播到所有线程。 SQLite3被认为是线程安全的,但是,线程可以是令人讨厌的小bugger ...

  • 有些东西会重置指针在它被初始化之后。我认为这不太可能,因为 sqlite3Config 结构在初始化后通常不会被修改。

  • b $ b

    我执行了一个简单的测试,顺便说一下,它可以用作临时解决方法:我添加了一个显式调用 sqite3_initialize() code> main(),允许它在任何线程启动之前执行。结果,分段错误消失了,我得到了一个你的服务器的shell提示符,它指向了这两个选项中的第一个。请注意,这是一个解决方法,因为 sqite3_initialize()不应被显式调用。这个问题的根源可能仍然存在,并以其他方式让自己知道 - 或者更糟糕的是,它可能会以细微的方式破坏事物,但很难察觉到。

    自从SQLite3是应该是线程安全的(以及<$ c $的源代码在这方面,sqlite3_initialize() function 似乎是正确的),但我不确定发生了什么。这可能是 sqlite3pp 封装器或线程启动方式的问题。


    INTRO

    I have a TCP/HTTP server that supports plugins in form of Shared Libraries (DLL and .so). It has make and .sln files build system via premake. When I start my application I feed to it a configuration file like this with description of what libraries server shall use as plugins and what arguments it shall pass to tham. For some time I had 2 plugins and all worked just fine. and even now works just fine if I feed to my server config fdiles alike this. But Now I have new plugin I am developing and so new config file.

    SETUP

    Steps required to setup my server on linux are fiew and simple

    • download build script (from here as described here)
    • ./cloud_server_net_setup.sh , no superuser needed, requires curl, make and g++ In regular case (not development this is enought - it will get boost, and other libraries it needs into local folder, it will build all of tham, build server in release form )
    • now you can cd into cloud_server/install-dir/
    • call export LD_LIBRARY_PATH=./:./lib_boost
    • and run our server ./CloudServer

    But we need debug wersion so after we call script we

    • cd cloud_server/CloudServer/projects/linux-gmake/
    • make
    • cd bin/debug
    • export LD_LIBRARY_PATH=./:(place from where we called our script)/cloud_server/install-dir/lib_boost

    PROBLEM

    • and now, finally we can call gdb.

    So we call it. and this is what we see:

     gdb ./CloudServer
    
    GNU gdb (GDB) 7.0.1-debian
    Copyright (C) 2009 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer...done.
    (gdb) r
    Starting program: /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer
    [Thread debugging using libthread_db enabled]
    Cloud Server v0.5
    Copyright (c) 2011 Cloud Forever. All rights reserved.
    
    Type 'help' to see help messages.
    Config file path: config.xml
    [New Thread 0x7ffff5967700 (LWP 11516)]
    [New Thread 0x7ffff5166700 (LWP 11517)]
    [New Thread 0x7ffff4965700 (LWP 11518)]
    [New Thread 0x7ffff4164700 (LWP 11519)]
    [New Thread 0x7ffff3963700 (LWP 11520)]
    [New Thread 0x7ffff3162700 (LWP 11521)]
    [New Thread 0x7ffff2961700 (LWP 11522)]
    [New Thread 0x7ffff2160700 (LWP 11523)]
    [New Thread 0x7ffff195f700 (LWP 11524)]
    [New Thread 0x7ffff115e700 (LWP 11525)]
    [New Thread 0x7ffff095d700 (LWP 11526)]
    [New Thread 0x7fffebfff700 (LWP 11527)]
    [New Thread 0x7fffeb7fe700 (LWP 11528)]
    [New Thread 0x7fffeaffd700 (LWP 11529)]
    [New Thread 0x7fffea7fc700 (LWP 11530)]
    [New Thread 0x7fffe9ffb700 (LWP 11531)]
    Library libFileService.so opened.
    [New Thread 0x7fffe953c700 (LWP 11532)]
    Library libUsersFilesService.so opened.
    
    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000000000 in ?? ()
    (gdb) x/i $pc
    0x0:    Cannot access memory at address 0x0
    

    I am Linux nube and all I know about Segmentation fault I know from wikipedia, but I know one more thing about my server and this new service I am creating - it compiles and runs on Windows with no errors at all (VS2008, 2010 solutions can be created from same premake script).

    So I wonder how and where in this 2 files .cpp and .h I have created an error that does not show on windows at alss an shows so dramaticvally on Linux? And is it fixable, or visiable to fresh eye?

    UPDATE: Valgrind output

    ole_jak@dspproc:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$ valgrind ./CloudServer
    ==11682== Memcheck, a memory error detector
    ==11682== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
    ==11682== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
    ==11682== Command: ./CloudServer
    ==11682==
    Cloud Server v0.5
    Copyright (c) 2011 Cloud Forever. All rights reserved.
    
    Type 'help' to see help messages.
    Config file path: config.xml
    Library libFileService.so opened.
    Library libUsersFilesService.so opened.
    ==11682== Jump to the invalid address stated on the next line
    ==11682==    at 0x0: ???
    ==11682==    by 0x4D49BE: sqlite3_free (sqlite3.c:18155)
    ==11682==    by 0x102242D5: sqlite3OsInit (sqlite3.c:14162)
    ==11682==    by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299)
    ==11682==    by 0x102A159F: openDatabase (sqlite3.c:108909)
    ==11682==    by 0x102A1B29: sqlite3_open (sqlite3.c:109156)
    ==11682==    by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89)
    ==11682==    by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74)
    ==11682==    by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171)
    ==11682==    by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38)
    ==11682==    by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156)
    ==11682==    by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208)
    ==11682==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
    ==11682==
    ==11682==
    ==11682== Process terminating with default action of signal 11 (SIGSEGV)
    ==11682==  Bad permissions for mapped region at address 0x0
    ==11682==    at 0x0: ???
    ==11682==    by 0x4D49BE: sqlite3_free (sqlite3.c:18155)
    ==11682==    by 0x102242D5: sqlite3OsInit (sqlite3.c:14162)
    ==11682==    by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299)
    ==11682==    by 0x102A159F: openDatabase (sqlite3.c:108909)
    ==11682==    by 0x102A1B29: sqlite3_open (sqlite3.c:109156)
    ==11682==    by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89)
    ==11682==    by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74)
    ==11682==    by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171)
    ==11682==    by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38)
    ==11682==    by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156)
    ==11682==    by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208)
    ==11682==
    ==11682== HEAP SUMMARY:
    ==11682==     in use at exit: 124,050 bytes in 1,083 blocks
    ==11682==   total heap usage: 1,814 allocs, 731 frees, 183,517 bytes allocated
    ==11682==
    ==11682== LEAK SUMMARY:
    ==11682==    definitely lost: 0 bytes in 0 blocks
    ==11682==    indirectly lost: 0 bytes in 0 blocks
    ==11682==      possibly lost: 46,248 bytes in 799 blocks
    ==11682==    still reachable: 77,802 bytes in 284 blocks
    ==11682==         suppressed: 0 bytes in 0 blocks
    ==11682== Rerun with --leak-check=full to see details of leaked memory
    ==11682==
    ==11682== For counts of detected and suppressed errors, rerun with: -v
    ==11682== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
    Убито
    ole_jak@dspproc:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$
    

    解决方案

    This is a nasty one. I am unsure about the exact root cause, but this seems to be a multi-threading related issue. The immediate cause of the problem is that the sqlite3Config.m.xSize function pointer is NULL at the place and time the error happens.

    This pointer is supposed to be initialized to point to a proper function the first time that sqlite3_initialize() is called, which normally happens the first time you open an SQLite database file. By setting breakpoints and watchpoints in GDB I was able to verify that the pointer is successfully set, yet at the time of the segmentation fault its value is NULL.

    That could mean one of two things:

    • The new pointer value is not properly propagated to all threads. SQLite3 is supposed to be thread-safe, but well, threads can be nasty little buggers...

    • Something resets the pointer after it has been initialized. I considered this highly unlikely since the sqlite3Config structure is not usually modified after initialization.

    I performed a simple test, which incidentally can be used as a temporary workaround: I added an explicit call to sqite3_initialize() as the first statement in main(), allowing it to be executed before any threads are launched. As a result, the segmentation fault went away and I got a shell prompt for your server, which points to the first of the two alternatives. Note that this is a workaround at best, since sqite3_initialize() is not supposed to be explicitly called. The root cause of the issue may still be present and make itself known otherwise - or, worse, it could break things in subtle, yet hard to detect, ways.

    Since SQLite3 is supposed to be thread-safe (and the source code of the sqlite3_initialize() function seems correct in that regard), I am unsure what is happening. It could be a problem with the sqlite3pp wrapper or with the way the threads are launched.

    这篇关于在Linux上不会发生在Windows上的不可见的SIGSEGV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆