Java VM:可重现的SIGSEGV在1.6.0_17和1.6.0_18上,如何报告? [英] Java VM: reproducable SIGSEGV on both 1.6.0_17 and 1.6.0_18, how to report?

查看:214
本文介绍了Java VM:可重现的SIGSEGV在1.6.0_17和1.6.0_18上,如何报告?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

EDIT :此可重现的SIGSEGV发生在具有多个proc和超过2GB mem的Linux机器上,因此Java默认为服务器模式。有趣的是,如果我强迫客户端没有崩溃了...(我仍然不知道该怎么做,我的可重复的SIGSEGV,但有趣的是)。



首先请注意,这有点相关,但不是与下面的相同,因为在我们的情况下,它只是一个SIGSEGV发生,我们可以可靠地触发它:



< a href =http://stackoverflow.com/questions/2297920/jvm-outofmemory-error-death-spiral-not-memory-leak> JVM OutOfMemory错误死亡螺旋 (不是内存泄漏)



这是相关的,因为当我们用大量数据提供我们的应用程序时会发生:数据来自文本文件,然后



我可以使用有效的Java代码可靠地触发一个JVM到SIGSEGV。



注意:我可以始终崩溃JVM 1.6.0_17 adn JVM 1.6.0_18,这个问题不是关于如何解决这个问题(例如使用VM参数可以解决这个问题,但我不是那样,我想知道怎么做这个总是可再现的SIGSEGV)。



有一个解决方法,只是在启动我们的应用程序(同时仍然使用Java 1.6在同一台机器上运行IntelliJ IDEA等)时使用Java 1.5,但我的问题是,如果这应该报告或不,应该,如何报告它知道日志本身包含专有信息(完整的hs_err _..._日志)。



硬件错误可以排除:




  • 这是发生在一个工作站,经常达到几个月的正常运行时间(我只重新启动,当关键的安全补丁影响我修整下来, Linux发布,这是真的不会经常发生)和应用程序从不崩溃(这是不可能的,这是一个硬件问题在该机器上[下面])


  • 相同的应用程序在相同的机器上在同一负载下的JVM 1.5下完美地工作(这是我如何测试应用程序:我只是在1.5 VM下启动)


  • 相同的应用程序在相同(巨大)负载下在超过一百个客户端机器上工作完全正常(从不在Windows + JVM 1.5或1.6上崩溃一次,而在OS X + JVM 1.5或1.6上从未崩溃过一次) [崩溃意味着来自客户端的即时电话])


  • 在同一台机器上的其他应用程序和1.6.0_17或1.6.0_18 JVM从不崩溃(例如,我有两个IntelliJ IDEA实例作为同一台机器上的两个不同用户运行,并且它们不会崩溃)


  • 在安装新的操作系统之前,安装一个新的操作系统,这是我安装Debian Lenny,不久以前)




这里是可再现的按需SIGSEGV:

  ... $ uname -a 
Linux saturn 2.6。 26-2-686#1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU / Linux
... $ export /home/wizard/jdk1.6.0_17/bin:$PATH
。 .. $ java -version
java版本1.6.0_17
Java(TM)SE运行环境(build 1.6.0_17-b04)
Java HotSpot(TM)服务器虚拟机-b01,混合模式)

启动应用程序,给它一个大量的数据,等待几秒钟...



然后,总是对于1.6.0_17:

 
#Java运行时环境检测到致命错误:

#SIGSEGV(0xb)at pc = 0xb76d0080,pid = 30793,tid = 2514328464

#JRE版本:6.0_17-b04
#Java VM:Java HotSpot TM服务器虚拟机(14.3-b01混合模式linux-x86)
#问题框架:
#V [libjvm.so + 0x4bc080]

#具有更多信息的错误报告文件另存为:
#/home/wizard/hs_err_pid30793.log

#如果您想提交错误报告,请访问:
#http://java.sun.com/webapps/bugreport/crash.jsp



(请注意,每个SIGSEGV的[libjvm.so + 0x4bc080]都与1.6.0_17一致)



或1.6.0_18:

 
#检测到致命错误Java运行时环境:

#SIGSEGV(0xb)at pc = 0xb77468f0,pid = 722,tid = 2514516880

#JRE版本:6.0_18-b07
#Java VM:Java HotSpot(TM)服务器VM(16.0-b13混合模式linux-x86)
#问题框架:
#V [libjvm.so + 0x4d88f0]

#包含更多信息的错误报告文件另存为:
#/home/wizard/hs_err_pid722.log

#如果您要​​提交错误报告,请访问:
#http://java.sun.com/webapps/bugreport/crash.jsp

已中止

(请注意,[libjvm.so + 0x4d88f0]行对于每个SIGSEGV的1.6.0_18是一致的)



问题是日志文件包含不能共享的专有信息



重现一个小测试用例或者:它类似于上面链接的问题,这只会发生在大量的数据喂给应用程序。



请注意,完全相同的应用程序,相同的硬件,完全相同的JVM但另一个版本的Linux(我以前有Debian蚀刻)没有触发SIGSEGV一次。



但这并不意味着JVM没有故障:它仍然可能是一个JVM问题。



我应该报告这个和如何? (记住,编写可重现的小测试用例是妄想,日志包含不应泄漏的专有信息)。我应该只是编辑日志并发送吗?



当日志包含专有信息时,以及当重现问题的测试用例不是现实中可行?



有没有人成功打开这样的错误,然后在随后的Java版本中解决了?



你认为Java社区报告这样的问题是好还是我不应该因为它不重要而麻烦?

解决方案

我有类似的问题升级到JDK 1.6_18,它似乎解决使用以下选项:

  server 
-Xms256m
-Xmx748m
-XX:MaxPermSize = 128m

-verbose:gc
-XX:+ PrintGCTimeStamps
-Xloggc :/tmp/gc.log
-XX:+ PrintHeapAtGC
-XX:+ PrintGCDetails
-XX:+ HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath =/ tmp

-XX:+ UseParallelGC
-XX:-UseGCOverheadLimit

#以下选项仅用于使用jconsole进行远程监视,可用于在运行时查看JVM行为
-Dcom .sun.management.jmxremote
-Dcom.sun.management.jmxremote.port = 12345
-Dcom.sun.management.jmxremote.authenticate = false
-Dcom.sun.management.jmxremote .ssl = false
-Djava.rmi.server.hostname = MyHost

(这是一个生产环境),但我认为错误是由于两个原因:



1)关于堆和/或永久空间的错误设置我想JDK 1.6需要更多的空间在堆和永久比以前的JVM版本)造成了OutOfMemoryError,但



2)在错误的原始设置有人写的

  -XX:+ HeapDumpOnOutOfMemoryError =/ tmp

而不是

  -XX:+ HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath =/ tmp

所以大概JVM不能写heapdump,我们只有SIGSEGV堆转储在工作目录中)。



检查 -server -XX:+ UseParallelGC -XX:-UseGCOverheadLimit 选项太。我认为玩VM参数不是解决方法,但正确的方法也是因为垃圾回收器(而不仅仅)在1.5和1.6之间变化。


EDIT: This reproducible SIGSEGV happens on a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless).

First note that this is a bit related but not identical to the following because in our case it's only a SIGSEGV that happens, and we can reliably trigger it:

JVM OutOfMemory error "death spiral" (not memory leak)

It's related because it happens when we feed our app with a "deluge of data": data are coming from text files and then number-crunched (yes, financial number crunching in Java).

I can reliably trigger a JVM to SIGSEGV using only valid Java code.

NOTE: I can invariably crash both JVM 1.6.0_17 adn JVM 1.6.0_18 and this question is not about how to workaround this issue (for example playing with VM parameters may fix the issue but I'm not after that, I want to know what to do with this always-reproducable SIGSEGV).

I've got a workaround which simply consists in using Java 1.5 when launching our app (while still using Java 1.6 to run IntelliJ IDEA, etc. on the same machine, simultaneously), but my question is if this should be reported or not and, if it should, how to report it knowing that the log itself contains proprietary information (the full hs_err_..._log).

Hardware error can be ruled out for:

  • this is happening on a workstation that regularly reaches months of uptime (I only reboot it when critical security patches affecting my trimmed down and hardened Debian Linux are issued, which really doesn't happen often) and on which applications never crash (making it very unlikely that it's an hardware issue on that machine [more below])

  • same application works perfectly on that same machine under a JVM 1.5 under the same load (this is how I'm testing the app: I simply launch it under a 1.5 VM)

  • same application works perfectly fine on more than one hundreds clients machine under the same (gigantic) load (never crashed once on Windows + JVM 1.5 or 1.6 and never crashed once on OS X + JVM 1.5 or 1.6 [a crash would mean an instant phone call from the client])

  • other application on that same machine and same 1.6.0_17 or 1.6.0_18 JVM never crash (for example I've got two instances of IntelliJ IDEA running as two different users on that same machine and they don't crash)

  • machine is tested with memtest "regularly" (before installing a new OS, which last happened when I installed Debian Lenny, not that long ago)

Here's the reproducible-on-demand SIGSEGV:

... $uname -a
Linux saturn 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux
... $ export /home/wizard/jdk1.6.0_17/bin:$PATH
... $ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

Launch the app, feed it a "deluge of data", wait a few seconds...

Then, invariably, for 1.6.0_17:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb76d0080, pid=30793, tid=2514328464
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Server VM (14.3-b01 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4bc080]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid30793.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

(note that the line '[libjvm.so+0x4bc080]' is consistent for 1.6.0_17 at every SIGSEGV)

or for 1.6.0_18:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb77468f0, pid=722, tid=2514516880
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4d88f0]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid722.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted

(note that the line "[libjvm.so+0x4d88f0]" is consistent for 1.6.0_18 at every SIGSEGV)

The problem is that the log file contains proprietary information that cannot be shared.

Reproducing a "tiny test case" that reproduce the issue ain't realistic either: it's similar to the issue linked above, this only happens when a "deluge of data" is feeded to the app.

Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.

But this doesn't mean the JVM isn't at fault: it could still be a JVM issue.

Should I report this and how? (keeping in mind that writing a "reproducible tiny test case" is delusional and that the log contains proprietary information that shouldn't be leaked). Should I just edit the log and send it?

What's the procedure to report such reproducible SIGSEGV when your log contains proprietary information and when a test case reproducing the issue ain't realistically doable?

Did any of you have success opening such a bug and then see it solved in a subsequent Java release?

Do you think it's good "for the Java community" to report such an issue or I just shouldn't bother because it's not important?

解决方案

I got similar problem upgrading to JDK 1.6_18 and it seems solved using the following options:

-server
-Xms256m
-Xmx748m
-XX:MaxPermSize=128m

-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/tmp/gc.log
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

-XX:+UseParallelGC
-XX:-UseGCOverheadLimit

# Following options just to remote monitoring with jconsole, useful to see JVM behaviour at runtime
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=12345
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=MyHost

I still didn't double check (it is a production environment), but I think the error was due to two reasons:

1) Wrong setting about heap and/or Permanent space (I think JDK 1.6 needs more space in heap and permanent than previous JVM versions) caused an OutOfMemoryError, but

2) in the wrong original setting somebody wrote

-XX:+HeapDumpOnOutOfMemoryError="/tmp"

and not

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

so probably JVM was not able to write the heapdump and we got SIGSEGV only (previous versions wrote heap dump in the working directory).

Check -server -XX:+UseParallelGC -XX:-UseGCOverheadLimit options too. I think playing with VM parameters is not a workaround, but the right approach also because garbage collector (and not only) changed between 1.5 and 1.6.

这篇关于Java VM:可重现的SIGSEGV在1.6.0_17和1.6.0_18上,如何报告?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆