JVM是否可以在不重新启动的情况下从OutOfMemoryError中恢复 [英] Can the JVM recover from an OutOfMemoryError without a restart

查看:77
本文介绍了JVM是否可以在不重新启动的情况下从OutOfMemoryError中恢复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  1. 如果在更多对象分配请求进入之前有机会运行GC,JVM是否可以在没有重启的情况下从OutOfMemoryError中恢复?

  1. Can the JVM recover from an OutOfMemoryError without a restart if it gets a chance to run the GC before more object allocation requests come in?

这方面的各种JVM实现是否有所不同?

Do the various JVM implementations differ in this aspect?

我的问题是关于JVM恢复而不是用户程序试图通过捕获错误来恢复。换句话说,如果在应用程序服务器(jboss / websphere / ..)中抛出OOME,我重启它吗?或者,如果进一步的请求似乎没有问题,我可以让它运行。

My question is about the JVM recovering and not the user program trying to recover by catching the error. In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it? Or can I let it run if further requests seem to work without a problem.

推荐答案

它可能有效,但通常是馊主意。无法保证您的应用程序在恢复时成功,或者它将知道它是否未成功。例如:

It may work, but it is generally a bad idea. There is no guarantee that your application will succeed in recovering, or that it will know if it has not succeeded. For example:


  • 确实可能没有足够的内存来执行请求的任务,即使在采取恢复步骤,如释放保留内存块。在这种情况下,您的应用程序可能会陷入循环,在该循环中它会反复出现恢复状态,然后再次耗尽内存。

  • There really may be not enough memory to do the requested tasks, even after taking recovery steps like releasing block of reserved memory. In this situation, your application may get stuck in a loop where it repeatedly appears to recover and then runs out of memory again.

OOME可能会被抛出任何线程。如果应用程序线程或库不是为处理它而设计的,这可能会使一些长期存在的数据结构处于不完整或不一致的状态。

The OOME may be thrown on any thread. If an application thread or library is not designed to cope with it, this might leave some long-lived data structure in an incomplete or inconsistent state.

如果线程作为OOME的结果,应用程序可能需要重新启动它们作为OOME恢复的一部分。至少,这会使应用程序变得更复杂。

If threads die as a result of the OOME, the application may need to restart them as part of the OOME recovery. At the very least, this makes the application more complicated.

假设一个线程使用notify / wait或某些更高级别的机制与其他线程同步。如果该线程从OOME死亡,其他线程可能会等待通知(等)永远不会来......例如。为此设计可能会使应用程序变得更加复杂。

Suppose that a thread synchronizes with other threads using notify/wait or some higher level mechanism. If that thread dies from an OOME, other threads may be left waiting for notifies (etc) that never come ... for example. Designing for this could make the application significantly more complicated.

总之,设计,实现和测试应用程序以恢复从OOME开始可能很困难,特别是如果应用程序(或运行它的框架,或它使用的任何库)是多线程的。将OOME视为致命错误更好。

In summary, designing, implementing and testing an application to recover from OOMEs can be difficult, especially if the application (or the framework in which it runs, or any of the libraries it uses) is multi-threaded. It is a better idea to treat OOME as a fatal error.

另请参阅我对相关问题的回答

编辑 - 以回应此后续行动问题:

EDIT - in response to this followup question:


换句话说,如果在应用程序服务器(jboss / websphere / ..)中抛出OOME,我是否 重启吗?

不,你没有 重启。但它可能明智,特别是如果您没有良好/自动的方式来检查服务是否正常运行。

No you don't have to restart. But it is probably wise to, especially if you don't have a good / automated way of checking that the service is running correctly.

JVM将恢复正常。但应用程序服务器和应用程序本身可能会或可能不会恢复,具体取决于它们的设计应对这种情况的程度。 (我的经验是,某些应用程序服务器旨在应对此问题,而设计和实现复杂的应用程序以便从OOME恢复很困难,并且正确测试它更难。)

The JVM will recover just fine. But the application server and the application itself may or may not recover, depending on how well they are designed to cope with this situation. (My experience is that some app servers are not designed to cope with this, and that designing and implementing a complicated application to recover from OOMEs is hard, and testing it properly is even harder.)

编辑2

回复此评论:


其他线程可能会等待通知(等)永远不会来真的吗?被杀死的线程不会解开它的堆栈,释放资源,包括锁定?

"other threads may be left waiting for notifies (etc) that never come" Really? Wouldn't the killed thread unwind its stacks, releasing resources as it goes, including held locks?

是的!考虑一下:

线程#1运行:

    synchronized(lock) {
         while (!someCondition) {
             lock.wait();
         }
    }
    // ...

线程# 2运行这个:

    synchronized(lock) {
         // do stuff
         lock.notify();
    }

如果线程#1正在等待通知,则线程#2获得OOME在 //做某事部分,然后线程#2不会进行 notify()调用,线程#1可能永远陷入等待不会发生的通知。当然,线程#2保证在 lock 对象上释放互斥锁......但这还不够!

If Thread #1 is waiting on the notify, and Thread #2 gets an OOME in the // do something section, then Thread #2 won't make the notify() call, and Thread #1 may get stuck forever waiting for a notification that won't ever occur. Sure, Thread #2 is guaranteed to release the mutex on the lock object ... but that is not sufficient!

如果不是线程运行的代码不是异常安全的,这是一个更普遍的问题。

If not the code ran by the thread is not exception safe, which is a more general problem.



< 异常安全不是我听说过的一个术语(虽然我知道你的意思)。 Java程序通常不会设计为对意外异常具有弹性。实际上,在如上所述的场景中,很可能介于很难和不可能使应用程序异常安全。

"Exception safe" is not a term I've heard of (though I know what you mean). Java programs are not normally designed to be resilient to unexpected exceptions. Indeed, in a scenario like the above, it is likely to be somewhere between hard and impossible to make the application exception safe.

你需要一些机制来实现失败线程#1(由于OOME)被转换为线程#2的线程间通信失败通知。 Erlang这样做......但不是Java。他们在Erlang中可以做到这一点的原因是Erlang进程使用严格的类似CSP的原语进行通信;即没有共享数据结构!

You'd need some mechanism whereby the failure of Thread #1 (due to the OOME) gets turned into an inter-thread communication failure notification to Thread #2. Erlang does this ... but not Java. The reason they can do this in Erlang is that Erlang processes communicate using strict CSP-like primitives; i.e. there is no sharing of data structures!

(请注意,您可以针对任何意外异常获得上述问题...只是错误例外。某些类型的Java代码试图从意外异常中恢复可能会很糟糕。)

(Note that you could get the above problem for just about any unexpected exception ... not just Error exceptions. There are certain kinds of Java code where attempting to recover from an unexpected exception is likely to end badly.)

这篇关于JVM是否可以在不重新启动的情况下从OutOfMemoryError中恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆