什么会导致 Java 本机函数(在 C 中)在进入时出现段错误? [英] What can cause a Java native function (in C) to segfault upon entry?

查看:24
本文介绍了什么会导致 Java 本机函数(在 C 中)在进入时出现段错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

项目

我正在使用 Java 本机接口编写一个 Java 命令行接口到一个内部网络和网络测试工具的 C 库.C 代码(不是我写的)复杂且低级,通常在位级别操作内存,并且专门使用原始套接字.该应用程序是多线程的,从 C 端(pthread 在后台运行)以及 Java 端(ScheduledThreadPoolExecutors 运行调用本机代码的线程).也就是说,C 库应该大部分是稳定的.事实证明,Java 和 JNI 接口代码会导致问题.

问题

应用程序在进入原生 C 函数时因分段错误而崩溃.这只发生在程序处于特定状态时(即成功运行特定的本机函数会导致对另一个特定本机函数的下一次调用出现段错误).此外,当发出 quit 命令时,应用程序会因类似的段错误而崩溃,但同样是在成功运行相同的特定本机函数之后.

我是一名缺乏经验的 C 开发人员和一名经验丰富的 Java 开发人员——我已经习惯了给我一个特定原因和特定行号的崩溃.在这种情况下,我所要做的只是 hs_err_pid*.log 输出和核心转储.我已经在这个问题的末尾包含了我能做的.

到目前为止我的工作

  1. 当然,我想找到发生崩溃的特定代码行.我在 Java 端的本机调用之前放置了一个 System.out.println() 并在程序崩溃的本机函数的第一行放置了一个 printf()确保之后直接使用 fflush(stdout) .System.out 调用运行而 <​​code>printf 调用没有运行.这告诉我在进入函数时发生了段错误——这是我以前从未见过的.
  2. 我对函数的参数进行了三次检查,以确保它们不会起作用.但是,我只传递了一个参数(类型为 jint).其他两个(JNIEnv *env, jobject j_object)是 JNI 构造,不受我控制.
  3. 我注释掉了函数中的每一行,最后只留下一个 return 0;.段错误仍然发生.这让我相信问题不在于这个函数.
  4. 我以不同的顺序运行命令(以不同的顺序有效地运行本机函数).只有在崩溃函数调用之前运行一个特定的本机函数时,才会发生段错误.此特定函数在运行时似乎表现正常.
  5. 我将 env 指针的值和 &j_object 的值打印在另一个函数的末尾附近,以确保我没有以某种方式破坏它们.我不知道我是否损坏了它们,但在退出函数时它们都有非零值.
  6. 编辑 1: 通常,相同的函数在多个线程中运行(通常不是并发的,但它应该是线程安全的).我在没有任何其他线程处于活动状态的情况下从主线程运行该函数,以确保 Java 端的多线程不会导致问题.不是,我遇到了同样的段错误.

所有这些都让我感到困惑.如果我注释掉整个函数,为什么它仍然是段错误,除了 return 语句?如果问题出在这个其他功能上,为什么它不会在那里失败?如果是第一个函数弄乱了内存,而第二个函数非法访问了损坏的内存的问题,为什么不失败就在非法访问的行上,而不是在进入函数时?

如果您看到一篇互联网文章,其中有人解释了与我类似的问题,请发表评论.有很多segfault文章,似乎没有一个包含这个特定问题.SO问题同上.问题也可能是我没有足够的经验来为这个问题应用抽象的解决方案.

我的问题

什么会导致 Java 原生函数(在 C 中)在这样的输入时出现段错误?我可以寻找哪些具体的东西来帮助我解决这个错误?我以后如何编写代码来帮助我避免这个问题?

有用的信息

为了记录,我实际上无法发布代码.如果您认为对代码的描述会有所帮助,请发表评论,我会对其进行编辑.

错误信息

<代码>## Java 运行时环境检测到一个致命错误:## SIGSEGV (0xb) at pc=0x00002aaaaaf6d9c3, pid=2185, tid=1086892352## JRE 版本:6.0_21-b06# Java 虚拟机:Java HotSpot(TM) 64 位服务器虚拟机(17.0-b16 混合模式 linux-amd64)# 有问题的框架:# j path.to.my.Object.native_function_name(I)I+0## 包含更多信息的错误报告文件保存为:#/path/to/hs_err_pid2185.log## 如果您想提交错误报告,请访问:# http://java.sun.com/webapps/bugreport/crash.jsp# 崩溃发生在 Java 虚拟机之外的本地代码中.# 查看有问题的框架以了解报告错误的位置.#

hs_err_pid*.log 文件的重要部分

--------------- T H R E A D ---------------当前线程 (0x000000004fd13800): JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000寄存器:RAX=0x34372e302e3095e1,RBX=0x00002aaaae39dcd0,RCX=0x0000000000000000,RDX=0x0000000000000000RSP=0x0000000040c89870,RBP=0x0000000040c898c0,RSI=0x0000000040c898e8,RDI=0x000000004fd139c8R8 =0x000000004fb631f0,R9 =0x000000004faf5d30,R10=0x00002aaaaaf6d999,R11=0x00002b1243b39580R12=0x00002aaaae3706d0,R13=0x00002aaaae39dcd0,R14=0x0000000040c898e8,R15=0x000000004fd13800RIP=0x00002aaaaaf6d9c3,EFL=0x0000000000010202,CSGSFS=0x0000000000000033,ERR=0x0000000000000000TRAPNO=0x000000000000000d堆栈:[0x0000000040b8a000,0x0000000040c8b000],sp=0x0000000040c89870,可用空间=3fe0000000000000018k本机帧:(J=编译的 Java 代码,j=解释的,Vv=VM 代码,C=本机代码)j path.to.my.Object.native_function_name(I)I+0j path.to.my.Object$CustomThread.fire()V+18j path.to.my.CustomThreadSuperClass.run()V+1j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4j java.util.concurrent.FutureTask$Sync.innerRun()V+30j java.util.concurrent.FutureTask.run()V+4j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28j java.lang.Thread.run()V+11v ~StubRoutines::call_stubV [libjvm.so+0x3e756d]V [libjvm.so+0x5f6f59]V [libjvm.so+0x3e6e39]V [libjvm.so+0x3e6eeb]V [libjvm.so+0x476387]V [libjvm.so+0x6ee452]V [libjvm.so+0x5f80df]Java 框架:(J=编译的 Java 代码,j=解释的,Vv=VM 代码)j path.to.my.Object.native_function_name(I)I+0j path.to.my.Object$CustomThread.fire()V+18j path.to.my.CustomThreadSuperClass.run()V+1j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4j java.util.concurrent.FutureTask$Sync.innerRun()V+30j java.util.concurrent.FutureTask.run()V+4j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1j java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15j java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28j java.lang.Thread.run()V+11v ~StubRoutines::call_stub-  -  -  -  -  -  - -  过程   -  -  -  -  -  -  - -Java 线程:(=> 当前线程)0x000000004fabc800 JavaThread "pool-1-thread-6" [_thread_new, id=2203, stack(0x0000000000000000,0x0000000000000000)]0x000000004fbcb000 JavaThread "pool-1-thread-5" [_thread_blocked, id=2202, stack(0x0000000042c13000,0x0000000042d14000)]0x000000004fbc9800 JavaThread "pool-1-thread-4" [_thread_blocked, id=2201, stack(0x0000000042b12000,0x0000000042c13000)]0x000000004fbc7800 JavaThread "pool-1-thread-3" [_thread_blocked, id=2200, stack(0x0000000042a11000,0x0000000042b12000)]0x000000004fc54800 JavaThread "pool-1-thread-2" [_thread_blocked, id=2199, stack(0x0000000042910000,0x0000000042a11000)]=>0x000000004fd13800 JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]0x000000004fb04800 JavaThread低内存检测器"守护进程 [_thread_blocked, id=2194, stack(0x0000000041d0d000,0x0000000041e0e000)]0x000000004fb02000 JavaThread "CompilerThread1" 守护进程 [_thread_blocked, id=2193, stack(0x0000000041c0c000,0x0000000041d0d000)]0x000000004fafc800 JavaThreadCompilerThread0"守护进程 [_thread_blocked, id=2192, stack(0x0000000040572000,0x0000000040673000)]0x000000004fafa800 JavaThread信号调度程序"守护进程 [_thread_blocked, id=2191, stack(0x0000000040471000,0x0000000040572000)]0x000000004fad6000 JavaThread终结器"守护进程 [_thread_blocked, id=2190, stack(0x0000000041119000,0x000000004121a000)]0x000000004fad4000 JavaThread引用处理程序"守护进程 [_thread_blocked, id=2189, stack(0x0000000041018000,0x0000000041119000)]0x000000004fa51000 JavaThread "main" [_thread_in_vm, id=2186, stack(0x00000000418cc000,0x00000000419cd000)]其他主题:0x000000004facf800 VMThread [堆栈:0x0000000040f17000,0x0000000041018000] [id=2188]0x000000004fb0f000 WatcherThread [堆栈:0x0000000041e0e000,0x0000000041f0f000] [id=2195]VM 状态:不在安全点(正常执行)VM Mutex/Monitor 当前由一个线程拥有:无堆PSYoungGen 总计 305856K,已使用 31465K [0x00002aaadded0000, 0x00002aaaf3420000, 0x00002aaaf3420000)伊甸园空间 262208K,已使用 12% [0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaeedee0000)从空间 43648K, 0% 使用 [0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)到空间 43648K,使用 0% [0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)PSOldGen 总计 699072K,已使用 0K [0x00002aaab3420000, 0x00002aaadded0000, 0x00002aaadded0000)对象空间 699072K,已使用 0% [0x00002aaab3420000,0x00002aaab3420000,0x00002aaadd0000)PSPermGen 总计 21248K,已使用 3741K [0x00002aaaae020000, 0x00002aaaaf4e0000, 0x00002aaab3420000)对象空间 21248K,已使用 17% [0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)虚拟机参数:jvm_args: -Xms1024m -Xmx1024m -XX:+UseParallelGC-  -  -  -  -  -  - -  系统   -  -  -  -  -  -  - -操作系统:Red Hat Enterprise Linux 客户端版本 5.5 (Tikanga)uname:Linux 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64libc:glibc 2.5 NPTL 2.5rlimit:堆栈 10240k,核心 102400k,NPROC 10000,NOFILE 1024,AS 无穷大平均负载:0.21 0.08 0.05CPU:共 1 个(每个 CPU 1 个内核,每个内核 1 个线程)系列 6 型号 26 步进 4、cmov、cx8、fxsr、mmx、sse、sse2、sse3、ssse3、sse4.1、sse4.2、popcnt内存:4k 页,物理 3913532k(1537020k 空闲),交换 1494004k(1494004k 空闲)vm_info:用于 linux-amd64 JRE (1.6.0_21-b06) 的 Java HotSpot(TM) 64 位服务器 VM (17.0-b16),由java_re"和 gcc 3.2.2 构建于 2010 年 6 月 22 日 01:10:00 (SuSE Linux)时间:2013年10月15日星期二15:08:13经过时间:13秒

Valgrind 输出

我真的不知道如何正确使用 Valgrind.这是运行 valgrind app arg1

时出现的问题

==2184====2184== 堆摘要:==2184== 在退出时使用:444 个块中的 16,914 个字节==2184== 总堆使用量:673 分配,229 释放,32,931 字节分配==2184====2184== 泄漏摘要:==2184== 肯定丢失:0 个块中的 0 个字节==2184== 间接丢失:0 个块中的 0 个字节==2184== 可能丢失:0 个块中的 0 个字节==2184== 仍然可达:444 个块中的 16,914 个字节==2184== 抑制:0 个块中的 0 个字节==2184== 使用 --leak-check=full 重新运行以查看泄漏内存的详细信息==2184====2184== 对于检测到和抑制的错误计数,重新运行:-v==2184== 错误摘要:0 个上下文中的 0 个错误(抑制:7 个来自 7 个)

编辑 2:

GDB 输出和回溯

我用 GDB 完成了它.我确保 C 库是使用 -g 标志编译的.

$ gdb `which java`GNU gdb (GDB) 红帽企业 Linux (7.0.1-23.el5)版权所有 (C) 2009 Free Software Foundation, Inc.许可证 GPLv3+:GNU GPL 版本 3 或更高版本 <http://gnu.org/licenses/gpl.html>这是免费软件:您可以自由更改和重新分发它.在法律允许的范围内,不提供任何保证.输入显示复制"和显示保修"了解详情.这个 GDB 被配置为x86_64-redhat-linux-gnu".有关错误报告说明,请参阅:<http://www.gnu.org/software/gdb/bugs/>...从/usr/bin/java 读取符号...(未找到调试符号)...完成.(gdb) 运行 -jar/opt/scts/scts.jar test.config启动程序:/usr/bin/java -jar/opt/scts/scts.jar test.config[启用使用 libthread_db 进行线程调试]执行新程序:/usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java[启用使用 libthread_db 进行线程调试][新线程 0x4022c940 (LWP 3241)][新线程 0x4032d940 (LWP 3242)][新线程 0x4042e940 (LWP 3243)][新线程 0x4052f940 (LWP 3244)][新线程 0x40630940 (LWP 3245)][新线程 0x40731940 (LWP 3246)][新线程 0x40832940 (LWP 3247)][新线程 0x40933940 (LWP 3248)][新线程 0x40a34940 (LWP 3249)]

...我的程序做了一些工作,并启动了一个后台线程...

[新线程 0x41435940 (LWP 3250)]

...我在下一个命令中键入似乎会导致段错误的命令;预计会有新线程...

[新线程 0x41536940 (LWP 3252)][新线程 0x41637940 (LWP 3253)][新线程 0x41738940 (LWP 3254)][新线程 0x41839940 (LWP 3255)][新线程 0x4193a940 (LWP 3256)]

...我键入实际触发段错误的命令.新线程是预期的,因为该函数在其自己的线程中运行.如果它没有 segfault,它会创建与上一个命令相同数量的线程...

[新线程 0x41a3b940 (LWP 3257)]程序收到信号 SIGSEGV,分段错误.[切换到线程 0x41839940 (LWP 3255)]0x00002aaaabcaec45 在??()

...我疯狂地阅读了gdb帮助,然后运行回溯...

(gdb) bt#0 0x00002aaaabcaec45 在??()#1 0x00002aaaf3ad7800 在??()#2 0x00002aaaf3ad81e8 在??()#3 0x0000000041838600 在??()#4 0x00002aaaeacddcd0 在??()#5 0x0000000041838668 在??()#6 0x00002aaaeace23f0 在??()#7 0x0000000000000000 在 ??()

... 如果我用 -g 编译,那不应该有符号吗?根据 make 的输出,我做到了:

gcc -g -Wall -fPIC -c -I ...gcc -g -shared -W1,soname, ...

解决方案

看来我已经解决了这个问题,为了其他人的利益,我将在这里概述.

发生了什么

分段错误的原因是我使用 sprintf() 将值分配给未分配值的 char * 指针.这是错误的代码:

char* ip_to_string(uint32_t ip){无符号字符字节[4];字节[0] = ip &0xFF;字节[1] = (ip >> 8) &0xFF;字节[2] = (ip >> 16) &0xFF;字节[3] = (ip >> 24) &0xFF;字符 *ip_string;sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}

指针ip_string在这里没有值,这意味着它没有指向任何东西.但是,这并不完全正确.它指向的是undefined.它可以指向任何地方.因此,在使用 sprintf() 为其赋值时,我无意中覆盖了随机的内存位.我相信奇怪行为的原因(尽管我从未证实这一点)是未定义的指针指向堆栈上的某个位置.这会导致计算机在调用某些函数时出现混乱.

解决此问题的一种方法是分配内存,然后将指针指向该内存,这可以通过 malloc() 完成.该解决方案看起来类似于:

char* ip_to_string(uint32_t ip){无符号字符字节[4];字节[0] = ip &0xFF;字节[1] = (ip >> 8) &0xFF;字节[2] = (ip >> 16) &0xFF;字节[3] = (ip >> 24) &0xFF;字符 *ip_string = malloc(16);sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}

这样做的问题是每个 malloc() 都需要通过调用 free() 来匹配,否则就会发生内存泄漏.如果我在此函数中调用 free(ip_string) ,则返回的指针将毫无用处,如果不这样做,则必须依靠调用此函数的代码来释放内存,这很漂亮危险的.

据我所知,对此的正确"解决方案是将已分配的指针传递给函数,这样函数就有责任填充指向的内存.这样,可以在代码块中调用 malloc()free().安全多了.这是新功能:

char* ip_to_string(uint32_t ip, char *ip_string){无符号字符字节[4];字节[0] = ip &0xFF;字节[1] = (ip >> 8) &0xFF;字节[2] = (ip >> 16) &0xFF;字节[3] = (ip >> 24) &0xFF;sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);返回ip_string;}

问题解答

什么会导致 Java 原生函数(在 C 中)在这样的输入时出现段错误?

如果您为尚未分配内存的指针分配值,您可能会意外覆盖堆栈上的内存.这可能不会导致立即失败,但可能会在您稍后调用其他函数时出现问题.

我可以寻找哪些具体的东西来帮助我解决这个错误?

像任何其他的一样寻找分段错误.诸如为未分配的内存分配值或取消引用空指针之类的事情.我不是这方面的专家,但我敢打赌有 许多网络资源用于此.p>

我以后如何编写代码来帮助我避免这个问题?

小心使用指针,尤其是当您负责创建指针时.如果你看到这样一行代码:

类型*变量;

...然后寻找看起来像...的行

变量 = ...;

...并确保此行在写入指向的内存之前出现.

The Project

I'm writing a Java command line interface to a C library of internal networking and network testing tools using the Java Native Interface. The C code (which I didn't write) is complex and low level, often manipulates memory at the bit level, and uses raw sockets exclusively. The application is multi-threaded from the C side (pthreads running in the background) as well as the Java side (ScheduledThreadPoolExecutors running threads that call native code). That said, the C library should be mostly stable. The Java and JNI interface code, as it turns out, is causing problems.

The Problem(s)

The application crashes with a segmentation fault upon entry into a native C function. This only happens when the program is in a specific state (i.e. successfully running a specific native function causes the next call to another specific native function to segfault). Additionally, the application crashes with a similar-looking segfault when the quit command is issued, but again, only after successfully running that same specific native function.

I'm an inexperienced C developer and an experienced Java developer -- I'm used to crashes giving me a specific reason and a specific line number. All I have to work from in this case is the hs_err_pid*.log output and the core dump. I've included what I could at the end of this question.

My Work So Far

  1. Naturally, I wanted to find the specific line of code where the crash happened. I placed a System.out.println() right before the native call on the Java side and a printf() as the first line of the native function where the program crashes being sure to use fflush(stdout) directly after. The System.out call ran and the printf call didn't. This tells me that the segfault happened upon entry into the function -- something I've never seen before.
  2. I triple checked the parameters to the function, to ensure that they wouldn't act up. However, I only pass one parameter (of type jint). The other two (JNIEnv *env, jobject j_object) are JNI constructs and out of my control.
  3. I commented out every single line in the function, leaving only a return 0; at the end. The segfault still happened. This leads me to believe that the problem is not in this function.
  4. I ran the command in different orders (effectively running the native functions different orders). The segfaults only happen when one specific native function is run before the crashing function call. This specific function appears to behave properly when it is run.
  5. I printed the value of the env pointer and the value of &j_object near the end of this other function, to ensure that I didn't somehow corrupt them. I don't know if I corrupted them, but both have non-zero values upon exiting the function.
  6. Edit 1: Typically, the same function is run in many threads (not usually concurrently, but it should be thread safe). I ran the function from the main thread without any other threads active to ensure that multithreading on the Java side wasn't causing the issue. It wasn't, and I got the same segfault.

All of this perplexes me. Why is does it still segfault if I comment out the whole function, except for the return statement? If the problem is in this other function, why doesn't it fail there? If it's a problem where the first function messes up the memory and the second function illegally accesses the corrupt memory, why doesn't if fail on the line with the illegal access, rather than on entry to the function?

If you see an internet article where someone explains a problem similar to mine, please comment it. There are so many segfault articles, and none seem to contain this specific problem. Ditto for SO questions. The problem may also be that I'm not experienced enough to apply an abstract solution to this problem.

My Question

What can cause a Java native function (in C) to segfault upon entry like this? What specific things can I look for that will help me squash this bug? How can I write code in the future that will help me avoid this problem?

Helpful Info

For the record, I can't actually post the code. If you think a description of the code would be helpful, comment and I'll edit it in.

Error Message

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002aaaaaf6d9c3, pid=2185, tid=1086892352
#
# JRE version: 6.0_21-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
# Problematic frame:
# j  path.to.my.Object.native_function_name(I)I+0
#
# An error report file with more information is saved as:
# /path/to/hs_err_pid2185.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

The Important Bits of the hs_err_pid*.log File

---------------  T H R E A D  ---------------

Current thread (0x000000004fd13800):  JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000

Registers:
RAX=0x34372e302e3095e1, RBX=0x00002aaaae39dcd0, RCX=0x0000000000000000, RDX=0x0000000000000000
RSP=0x0000000040c89870, RBP=0x0000000040c898c0, RSI=0x0000000040c898e8, RDI=0x000000004fd139c8
R8 =0x000000004fb631f0, R9 =0x000000004faf5d30, R10=0x00002aaaaaf6d999, R11=0x00002b1243b39580
R12=0x00002aaaae3706d0, R13=0x00002aaaae39dcd0, R14=0x0000000040c898e8, R15=0x000000004fd13800
RIP=0x00002aaaaaf6d9c3, EFL=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
  TRAPNO=0x000000000000000d



Stack: [0x0000000040b8a000,0x0000000040c8b000],  sp=0x0000000040c89870,  free space=3fe0000000000000018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
j  path.to.my.Object.native_function_name(I)I+0
j  path.to.my.Object$CustomThread.fire()V+18
j  path.to.my.CustomThreadSuperClass.run()V+1
j  java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j  java.util.concurrent.FutureTask$Sync.innerRun()V+30
j  java.util.concurrent.FutureTask.run()V+4
j  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [libjvm.so+0x3e756d]
V  [libjvm.so+0x5f6f59]
V  [libjvm.so+0x3e6e39]
V  [libjvm.so+0x3e6eeb]
V  [libjvm.so+0x476387]
V  [libjvm.so+0x6ee452]
V  [libjvm.so+0x5f80df]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  path.to.my.Object.native_function_name(I)I+0
j  path.to.my.Object$CustomThread.fire()V+18
j  path.to.my.CustomThreadSuperClass.run()V+1
j  java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4
j  java.util.concurrent.FutureTask$Sync.innerRun()V+30
j  java.util.concurrent.FutureTask.run()V+4
j  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V+1
j  java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V+15
j  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub



---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x000000004fabc800 JavaThread "pool-1-thread-6" [_thread_new, id=2203, stack(0x0000000000000000,0x0000000000000000)]
  0x000000004fbcb000 JavaThread "pool-1-thread-5" [_thread_blocked, id=2202, stack(0x0000000042c13000,0x0000000042d14000)]
  0x000000004fbc9800 JavaThread "pool-1-thread-4" [_thread_blocked, id=2201, stack(0x0000000042b12000,0x0000000042c13000)]
  0x000000004fbc7800 JavaThread "pool-1-thread-3" [_thread_blocked, id=2200, stack(0x0000000042a11000,0x0000000042b12000)]
  0x000000004fc54800 JavaThread "pool-1-thread-2" [_thread_blocked, id=2199, stack(0x0000000042910000,0x0000000042a11000)]
=>0x000000004fd13800 JavaThread "pool-1-thread-1" [_thread_in_native, id=2198, stack(0x0000000040b8a000,0x0000000040c8b000)]
  0x000000004fb04800 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=2194, stack(0x0000000041d0d000,0x0000000041e0e000)]
  0x000000004fb02000 JavaThread "CompilerThread1" daemon [_thread_blocked, id=2193, stack(0x0000000041c0c000,0x0000000041d0d000)]
  0x000000004fafc800 JavaThread "CompilerThread0" daemon [_thread_blocked, id=2192, stack(0x0000000040572000,0x0000000040673000)]
  0x000000004fafa800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=2191, stack(0x0000000040471000,0x0000000040572000)]
  0x000000004fad6000 JavaThread "Finalizer" daemon [_thread_blocked, id=2190, stack(0x0000000041119000,0x000000004121a000)]
  0x000000004fad4000 JavaThread "Reference Handler" daemon [_thread_blocked, id=2189, stack(0x0000000041018000,0x0000000041119000)]
  0x000000004fa51000 JavaThread "main" [_thread_in_vm, id=2186, stack(0x00000000418cc000,0x00000000419cd000)]

Other Threads:
  0x000000004facf800 VMThread [stack: 0x0000000040f17000,0x0000000041018000] [id=2188]
  0x000000004fb0f000 WatcherThread [stack: 0x0000000041e0e000,0x0000000041f0f000] [id=2195]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap
 PSYoungGen      total 305856K, used 31465K [0x00002aaadded0000, 0x00002aaaf3420000, 0x00002aaaf3420000)
  eden space 262208K, 12% used [0x00002aaadded0000,0x00002aaadfd8a6a8,0x00002aaaedee0000)
  from space 43648K, 0% used [0x00002aaaf0980000,0x00002aaaf0980000,0x00002aaaf3420000)
  to   space 43648K, 0% used [0x00002aaaedee0000,0x00002aaaedee0000,0x00002aaaf0980000)
 PSOldGen        total 699072K, used 0K [0x00002aaab3420000, 0x00002aaadded0000, 0x00002aaadded0000)
  object space 699072K, 0% used [0x00002aaab3420000,0x00002aaab3420000,0x00002aaadded0000)
 PSPermGen       total 21248K, used 3741K [0x00002aaaae020000, 0x00002aaaaf4e0000, 0x00002aaab3420000)
  object space 21248K, 17% used [0x00002aaaae020000,0x00002aaaae3c77c0,0x00002aaaaf4e0000)


VM Arguments:
jvm_args: -Xms1024m -Xmx1024m -XX:+UseParallelGC


---------------  S Y S T E M  ---------------

OS:Red Hat Enterprise Linux Client release 5.5 (Tikanga)

uname:Linux 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64
libc:glibc 2.5 NPTL 2.5
rlimit: STACK 10240k, CORE 102400k, NPROC 10000, NOFILE 1024, AS infinity
load average:0.21 0.08 0.05

CPU:total 1 (1 cores per cpu, 1 threads per core) family 6 model 26 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt

Memory: 4k page, physical 3913532k(1537020k free), swap 1494004k(1494004k free)

vm_info: Java HotSpot(TM) 64-Bit Server VM (17.0-b16) for linux-amd64 JRE (1.6.0_21-b06), built on Jun 22 2010 01:10:00 by "java_re" with gcc 3.2.2 (SuSE Linux)

time: Tue Oct 15 15:08:13 2013
elapsed time: 13 seconds

Valgrind Output

I don't really know how to use Valgrind properly. This is what came up when running valgrind app arg1

==2184== 
==2184== HEAP SUMMARY:
==2184==     in use at exit: 16,914 bytes in 444 blocks
==2184==   total heap usage: 673 allocs, 229 frees, 32,931 bytes allocated
==2184== 
==2184== LEAK SUMMARY:
==2184==    definitely lost: 0 bytes in 0 blocks
==2184==    indirectly lost: 0 bytes in 0 blocks
==2184==      possibly lost: 0 bytes in 0 blocks
==2184==    still reachable: 16,914 bytes in 444 blocks
==2184==         suppressed: 0 bytes in 0 blocks
==2184== Rerun with --leak-check=full to see details of leaked memory
==2184== 
==2184== For counts of detected and suppressed errors, rerun with: -v
==2184== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7 from 7)

Edit 2:

GDB Output and Backtrace

I ran it through with GDB. I made sure that the C library was compiled with the -g flag.

$ gdb `which java`
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.
(gdb) run -jar /opt/scts/scts.jar test.config
Starting program: /usr/bin/java -jar /opt/scts/scts.jar test.config
[Thread debugging using libthread_db enabled]
Executing new program: /usr/lib/jvm/java-1.6.0-sun-1.6.0.21.x86_64/jre/bin/java
[Thread debugging using libthread_db enabled]
[New Thread 0x4022c940 (LWP 3241)]
[New Thread 0x4032d940 (LWP 3242)]
[New Thread 0x4042e940 (LWP 3243)]
[New Thread 0x4052f940 (LWP 3244)]
[New Thread 0x40630940 (LWP 3245)]
[New Thread 0x40731940 (LWP 3246)]
[New Thread 0x40832940 (LWP 3247)]
[New Thread 0x40933940 (LWP 3248)]
[New Thread 0x40a34940 (LWP 3249)]

... my program does some work, and starts a background thread ...

[New Thread 0x41435940 (LWP 3250)]

... I type the command that seems to cause the segfault on the next command; the new threads are expected ...

[New Thread 0x41536940 (LWP 3252)]
[New Thread 0x41637940 (LWP 3253)]
[New Thread 0x41738940 (LWP 3254)]
[New Thread 0x41839940 (LWP 3255)]
[New Thread 0x4193a940 (LWP 3256)]

... I type the command that actually triggers the segfault. The new thread is expected, since the function is run in its own thread. If it did not segfault, it would have created the same number of thread as the previous command ...

[New Thread 0x41a3b940 (LWP 3257)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41839940 (LWP 3255)]
0x00002aaaabcaec45 in ?? ()

... I furiously read through the gdb help, then run the backtrace ...

(gdb) bt
#0  0x00002aaaabcaec45 in ?? ()
#1  0x00002aaaf3ad7800 in ?? ()
#2  0x00002aaaf3ad81e8 in ?? ()
#3  0x0000000041838600 in ?? ()
#4  0x00002aaaeacddcd0 in ?? ()
#5  0x0000000041838668 in ?? ()
#6  0x00002aaaeace23f0 in ?? ()
#7  0x0000000000000000 in ?? ()

... Shouldn't that have symbols if I compiled with -g? I did, according to the lines from the output of make:

gcc -g -Wall -fPIC -c -I ...
gcc -g -shared -W1,soname, ...

解决方案

Looks like I've solved the issue, which I'll outline here for the benefit of others.

What Happened

The cause of the segmentation fault was that I used sprintf() to assign a value to a char * pointer which had not been assigned a value. Here is the bad code:

char* ip_to_string(uint32_t ip)
{
    unsigned char bytes[4];
    bytes[0] = ip & 0xFF;
    bytes[1] = (ip >> 8) & 0xFF;
    bytes[2] = (ip >> 16) & 0xFF;
    bytes[3] = (ip >> 24) & 0xFF;

    char *ip_string;
    sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
    return ip_string;
}

The pointer ip_string does not have a value here, which means it points to nothing. Except, that's not entirely true. What it points to is undefined. It could point anywhere. So in assigning a value to it with sprintf(), I inadvertently overwrote a random bit of memory. I believe that the reason for the odd behaviour (though I never confirmed this) was that the undefined pointer was pointing to somewhere on the stack. This caused the computer to be confused when certain functions were called.

One way to fix this is to allocate memory and then point the pointer to that memory, which can be accomplished with malloc(). That solution would look similar to this:

char* ip_to_string(uint32_t ip)
{
    unsigned char bytes[4];
    bytes[0] = ip & 0xFF;
    bytes[1] = (ip >> 8) & 0xFF;
    bytes[2] = (ip >> 16) & 0xFF;
    bytes[3] = (ip >> 24) & 0xFF;

    char *ip_string = malloc(16);
    sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
    return ip_string;
}

The problem with this is that every malloc() needs to be matched by a call to free(), or you have a memory leak. If I call free(ip_string) inside this function the returned pointer will be useless, and if I don't then I have to rely on the code that's calling this function to release the memory, which is pretty dangerous.

As far as I can tell, the "right" solution to this is to pass an already allocated pointer to the function, such that it is the function's responsibility to fill pointed to memory. That way, calls to malloc() and free() can be made in the block of code. Much safer. Here's the new function:

char* ip_to_string(uint32_t ip, char *ip_string)
{
    unsigned char bytes[4];
    bytes[0] = ip & 0xFF;
    bytes[1] = (ip >> 8) & 0xFF;
    bytes[2] = (ip >> 16) & 0xFF;
    bytes[3] = (ip >> 24) & 0xFF;

    sprintf(ip_string, "%d.%d.%d.%d", bytes[0], bytes[1], bytes[2], bytes[3]);
    return ip_string;
}

Answers to the Questions

What can cause a Java native function (in C) to segfault upon entry like this?

If you assign a value to a pointer that hasn't been allocated memory, you may accidentally overwrite memory on the stack. This may not cause an immediate failure, but will probably cause problems when you call other functions later.

What specific things can I look for that will help me squash this bug?

Look for a segmentation fault like any other. Things like assigning a value to unallocated memory or dereferencing a null pointer. I'm not an expert on this, but I'm willing to bet that there are many web resources for this.

How can I write code in the future that will help me avoid this problem?

Be careful with pointers, especially when you are responsible for creating them. If you see a line of code that looks like this:

type *variable;

... then look for a line that looks like ...

variable = ...;

... and make sure that this line comes before writing to the pointed to memory.

这篇关于什么会导致 Java 本机函数(在 C 中)在进入时出现段错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆