Java VM:1.6.0_17和1.6.0_18都可以重现SIGSEGV,如何报告? [英] Java VM: reproducible SIGSEGV on both 1.6.0_17 and 1.6.0_18, how to report?

查看:239
本文介绍了Java VM:1.6.0_17和1.6.0_18都可以重现SIGSEGV,如何报告?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:这个可重现的SIGSEGV发生在具有多个proc和超过2GB内存的Linux机器上,因此Java默认为-server模式。有趣的是,如果我强迫-client再也没有崩溃......(我仍然不太清楚如何处理我可重复的SIGSEGV,但它仍然很有趣)。

EDIT: This reproducible SIGSEGV happens on a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless).

首先请注意,这与下面的内容有点相关但不完全相同,因为在我们的情况下,只发生了一个SIGSEGV,我们可以可靠地触发它:

First note that this is a bit related but not identical to the following because in our case it's only a SIGSEGV that happens, and we can reliably trigger it:

< a href =https://stackoverflow.com/questions/2297920/jvm-outofmemory-error-death-spiral-not-memory-leak> JVM OutOfMemory错误死亡螺旋 (不是内存泄漏)

它是相关的,因为它发生在我们的应用程序带有大量数据时:数据来自文本文件然后number-crunched(是的,Java中的财务数字运算)。

It's related because it happens when we feed our app with a "deluge of data": data are coming from text files and then number-crunched (yes, financial number crunching in Java).

我只能使用有效的Java代码可靠地触发JVM到SIGSEGV。

I can reliably trigger a JVM to SIGSEGV using only valid Java code.

注意:我总是会崩溃JVM 1.6.0_17和JVM 1.6.0_18这个问题并不是关于如何解决这个问题(例如使用VM参数可能解决问题,但我不是在那之后,我想知道如何处理这个总是可重复的SIGSEGV。)

NOTE: I can invariably crash both JVM 1.6.0_17 adn JVM 1.6.0_18 and this question is not about how to workaround this issue (for example playing with VM parameters may fix the issue but I'm not after that, I want to know what to do with this always-reproducable SIGSEGV).

我是得到了一个解决方法,它只是在启动我们的应用程序时使用Java 1.5(同时仍然使用Java 1.6在同一台机器上运行IntelliJ IDEA等),但我的问题是,是否应该报告这个,如果它应该如何报告它知道日志本身包含专有信息(完整的hs_err _..._日志)。

I've got a workaround which simply consists in using Java 1.5 when launching our app (while still using Java 1.6 to run IntelliJ IDEA, etc. on the same machine, simultaneously), but my question is if this should be reported or not and, if it should, how to report it knowing that the log itself contains proprietary information (the full hs_err_..._log).

可以排除硬件错误:


  • 这发生在一个经常达到几个月正常运行时间的工作站(我只在重要的安全补丁影响我的修复和强化的Debian Linux时才重新启动它,这实际上并不经常发生)以及哪些应用程序永远不会崩溃(这使得它不太可能是一个该机器上的硬件问题[更多信息如下])

  • this is happening on a workstation that regularly reaches months of uptime (I only reboot it when critical security patches affecting my trimmed down and hardened Debian Linux are issued, which really doesn't happen often) and on which applications never crash (making it very unlikely that it's an hardware issue on that machine [more below])

相同的应用程序在相同负载下的JVM 1.5下的同一台机器上运行完美(这就是我的方式) m测试应用程序:我只需在1.5 VM下启动它

same application works perfectly on that same machine under a JVM 1.5 under the same load (this is how I'm testing the app: I simply launch it under a 1.5 VM)

同样的应用程序在相同(巨大)下的超过一百台客户机上工作得很好加载(永远不会在Windows + JVM 1.5或1.6上崩溃一次,并且从未在OS X + JVM 1.5或1.6上崩溃一次[崩溃意味着客户端即时通话])

same application works perfectly fine on more than one hundreds clients machine under the same (gigantic) load (never crashed once on Windows + JVM 1.5 or 1.6 and never crashed once on OS X + JVM 1.5 or 1.6 [a crash would mean an instant phone call from the client])

同一台机器上的其他应用程序以及相同的1.6.0_17或1.6.0_18 JVM ne ver崩溃(例如我有两个IntelliJ IDEA实例在同一台机器上作为两个不同的用户运行而且它们没有崩溃)

other application on that same machine and same 1.6.0_17 or 1.6.0_18 JVM never crash (for example I've got two instances of IntelliJ IDEA running as two different users on that same machine and they don't crash)

机器是经常使用memtest测试(在安装新操作系统之前,最后一次安装Debian Lenny时,不久前),

machine is tested with memtest "regularly" (before installing a new OS, which last happened when I installed Debian Lenny, not that long ago)

以下是可重现的按需SIGSEGV:

Here's the reproducible-on-demand SIGSEGV:

... $uname -a
Linux saturn 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux
... $ export /home/wizard/jdk1.6.0_17/bin:$PATH
... $ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

启动应用程序,输入大量数据,等几秒......

Launch the app, feed it a "deluge of data", wait a few seconds...

然后,总是,1.6.0_17:

Then, invariably, for 1.6.0_17:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb76d0080, pid=30793, tid=2514328464
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Server VM (14.3-b01 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4bc080]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid30793.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

(注意'[libjvm.so + 0x4bc080]行'对于每个SIGSEGV的1.6.0_17都是一致的)

(note that the line '[libjvm.so+0x4bc080]' is consistent for 1.6.0_17 at every SIGSEGV)

或1.6.0_18:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb77468f0, pid=722, tid=2514516880
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4d88f0]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid722.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted

(注意[libjvm.so + 0x4d88f0]行在每个SIGSEGV上对于1.6.0_18是一致的)

(note that the line "[libjvm.so+0x4d88f0]" is consistent for 1.6.0_18 at every SIGSEGV)

问题在于日志文件包含无法共享的专有信息

The problem is that the log file contains proprietary information that cannot be shared.

再现一个重现问题的微小测试用例也不现实:它与问题类似上面链接的,这只发生在应用程序需要大量数据时。

Reproducing a "tiny test case" that reproduce the issue ain't realistic either: it's similar to the issue linked above, this only happens when a "deluge of data" is feeded to the app.

请注意完全相同的应用程序,在完全相同的硬件上相同的JVM但是另一个版本的Linux(我之前有过Debian Etch)没有触发过一次SIGSEGV。

Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.

但这并不意味着JVM没有错:它可能仍然是JVM问题。

But this doesn't mean the JVM isn't at fault: it could still be a JVM issue.

我应该报道这个怎么样? (请记住,编写可重现的微小测试用例是妄想,并且日志包含不应泄露的专有信息)。我应该只编辑日志并发送它吗?

Should I report this and how? (keeping in mind that writing a "reproducible tiny test case" is delusional and that the log contains proprietary information that shouldn't be leaked). Should I just edit the log and send it?

当您的日志包含专有信息以及复制问题的测试用例不是时,报告此类可重现的SIGSEGV的过程是什么?现实可行吗?

What's the procedure to report such reproducible SIGSEGV when your log contains proprietary information and when a test case reproducing the issue ain't realistically doable?

你们有没有成功打开这样的bug然后看到它在随后的Java版本中解决了?

Did any of you have success opening such a bug and then see it solved in a subsequent Java release?

你觉得Java社区报告这样的问题是好的吗?或者我不应该打扰因为它不重要?

Do you think it's good "for the Java community" to report such an issue or I just shouldn't bother because it's not important?

推荐答案

我有类似的问题升级到JDK 1.6_18,似乎使用以下选项解决了:

I got similar problem upgrading to JDK 1.6_18 and it seems solved using the following options:

-server
-Xms256m
-Xmx748m
-XX:MaxPermSize=128m

-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/tmp/gc.log
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

-XX:+UseParallelGC
-XX:-UseGCOverheadLimit

# Following options just to remote monitoring with jconsole, useful to see JVM behaviour at runtime
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=12345
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=MyHost

我仍然没有仔细检查(这是一个生产环境),但我认为错误是由于两个原因:

I still didn't double check (it is a production environment), but I think the error was due to two reasons:

1)关于堆和/或永久空间的错误设置(我认为JDK 1.6在堆中需要更多空间并且永久性比以前的JVM版本更长)导致OutOfMemoryError,但是

1) Wrong setting about heap and/or Permanent space (I think JDK 1.6 needs more space in heap and permanent than previous JVM versions) caused an OutOfMemoryError, but

2)在有人写的错误的原始设置中

2) in the wrong original setting somebody wrote

-XX:+HeapDumpOnOutOfMemoryError="/tmp"

而不是

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

所以JVM可能无法编写heapdum p我们只得到了SIGSEGV(以前的版本在工作目录中写了堆转储)。

so probably JVM was not able to write the heapdump and we got SIGSEGV only (previous versions wrote heap dump in the working directory).

检查 -server -XX:+ UseParallelGC -XX: -UseGCOverheadLimit 选项。我认为使用VM参数不是一种解决方法,但正确的方法也是因为垃圾收集器(并且不仅仅)在1.5和1.6之间变化。

Check -server -XX:+UseParallelGC -XX:-UseGCOverheadLimit options too. I think playing with VM parameters is not a workaround, but the right approach also because garbage collector (and not only) changed between 1.5 and 1.6.

这篇关于Java VM:1.6.0_17和1.6.0_18都可以重现SIGSEGV,如何报告?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆