JVM 能否在不重启的情况下从 OutOfMemoryError 中恢复 [英] Can the JVM recover from an OutOfMemoryError without a restart

查看:33
本文介绍了JVM 能否在不重启的情况下从 OutOfMemoryError 中恢复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  1. 如果 JVM 有机会在更多对象分配请求到来之前运行 GC,是否可以在不重启的情况下从 OutOfMemoryError 中恢复?

各种 JVM 实现在这方面是否有所不同?

Do the various JVM implementations differ in this aspect?

我的问题是关于 JVM 的恢复,而不是用户程序试图通过捕获错误来恢复.换句话说,如果在应用服务器 (jboss/websphere/..) 中抛出一个 OOME,我是否必须重新启动它?或者,如果进一步的请求似乎没有问题,我可以让它运行.

My question is about the JVM recovering and not the user program trying to recover by catching the error. In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it? Or can I let it run if further requests seem to work without a problem.

推荐答案

它可能有效,但通常是个坏主意.无法保证您的应用程序会成功恢复,或者它会知道它是否成功.例如:

It may work, but it is generally a bad idea. There is no guarantee that your application will succeed in recovering, or that it will know if it has not succeeded. For example:

  • 确实可能没有足够的内存来执行请求的任务,即使在采取了诸如释放保留内存块之类的恢复步骤之后也是如此.在这种情况下,您的应用程序可能会陷入循环,反复出现恢复状态,然后再次耗尽内存.

  • There really may be not enough memory to do the requested tasks, even after taking recovery steps like releasing block of reserved memory. In this situation, your application may get stuck in a loop where it repeatedly appears to recover and then runs out of memory again.

OOME 可能会在任何线程上抛出.如果应用程序线程或库没有被设计用来处理它,这可能会使一些长期存在的数据结构处于不完整或不一致的状态.

The OOME may be thrown on any thread. If an application thread or library is not designed to cope with it, this might leave some long-lived data structure in an incomplete or inconsistent state.

如果线程因 OOME 而死亡,作为 OOME 恢复的一部分,应用程序可能需要重新启动它们.至少,这会使应用程序变得更加复杂.

If threads die as a result of the OOME, the application may need to restart them as part of the OOME recovery. At the very least, this makes the application more complicated.

假设一个线程使用通知/等待或一些更高级别的机制与其他线程同步.如果该线程因 OOME 死亡,则其他线程可能会等待永远不会到来的通知(等)……例如.为此进行设计可能会使应用程序变得更加复杂.

Suppose that a thread synchronizes with other threads using notify/wait or some higher level mechanism. If that thread dies from an OOME, other threads may be left waiting for notifies (etc) that never come ... for example. Designing for this could make the application significantly more complicated.

总而言之,设计、实现和测试应用程序以从 OOME 中恢复可能很困难,特别是如果应用程序(或它运行的框架,或它使用的任何库)是多线程的.最好将 OOME 视为致命错误.

In summary, designing, implementing and testing an application to recover from OOMEs can be difficult, especially if the application (or the framework in which it runs, or any of the libraries it uses) is multi-threaded. It is a better idea to treat OOME as a fatal error.

另见我对相关问题的回答:

EDIT - 回应此后续问题:

EDIT - in response to this followup question:

换句话说,如果在应用服务器 (jboss/websphere/..) 中抛出 OOME,我是否必须重新启动它?

In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it?

不,您不必必须重新启动.但这可能是明智的,特别是如果您没有良好/自动化的方法来检查服务是否正常运行.

No you don't have to restart. But it is probably wise to, especially if you don't have a good / automated way of checking that the service is running correctly.

JVM 会恢复得很好.但是应用程序服务器和应用程序本身可能会也可能不会恢复,这取决于它们被设计为如何处理这种情况.(我的经验是,某些应用服务器不是旨在应对这种情况,并且设计和实现一个复杂的应用程序以从 OOME 中恢复是很困难的,而对其进行正确测试则更加困难.)

The JVM will recover just fine. But the application server and the application itself may or may not recover, depending on how well they are designed to cope with this situation. (My experience is that some app servers are not designed to cope with this, and that designing and implementing a complicated application to recover from OOMEs is hard, and testing it properly is even harder.)

编辑 2

回应此评论:

其他线程可能会等待永远不会到来的通知(等)" 真的吗?被杀死的线程不会解开它的堆栈,在执行过程中释放资源,包括持有的锁吗?

"other threads may be left waiting for notifies (etc) that never come" Really? Wouldn't the killed thread unwind its stacks, releasing resources as it goes, including held locks?

是的!考虑一下:

线程 #1 运行:

    synchronized(lock) {
         while (!someCondition) {
             lock.wait();
         }
    }
    // ...

线程 #2 运行:

    synchronized(lock) {
         // do something
         lock.notify();
    }

如果线程 #1 正在等待通知,并且线程 #2 在 //do something 部分得到一个 OOME,那么线程 #2 不会使 notify() 调用,线程 #1 可能会永远卡住等待永远不会发生的通知.当然,线程 #2 保证会释放 lock 对象上的互斥锁……但这还不够!

If Thread #1 is waiting on the notify, and Thread #2 gets an OOME in the // do something section, then Thread #2 won't make the notify() call, and Thread #1 may get stuck forever waiting for a notification that won't ever occur. Sure, Thread #2 is guaranteed to release the mutex on the lock object ... but that is not sufficient!

如果不是,线程运行的代码不是异常安全的,这是一个更普遍的问题.

If not the code ran by the thread is not exception safe, which is a more general problem.

异常安全"不是我听说过的术语(虽然我知道你的意思).Java 程序通常不会被设计为对意外异常具有弹性.事实上,在上述场景中,使应用程序异常安全很可能介于困难和不可能之间.

"Exception safe" is not a term I've heard of (though I know what you mean). Java programs are not normally designed to be resilient to unexpected exceptions. Indeed, in a scenario like the above, it is likely to be somewhere between hard and impossible to make the application exception safe.

您需要某种机制,借此将线程 #1 的失败(由于 OOME)转变为线程 #2 的线程间通信失败通知.Erlang 这样做……但不是 Java.他们可以在 Erlang 中这样做的原因是 Erlang 进程使用严格的类似 CSP 的原语进行通信;即没有共享数据结构!

You'd need some mechanism whereby the failure of Thread #1 (due to the OOME) gets turned into an inter-thread communication failure notification to Thread #2. Erlang does this ... but not Java. The reason they can do this in Erlang is that Erlang processes communicate using strict CSP-like primitives; i.e. there is no sharing of data structures!

(请注意,几乎所有意外异常都可能遇到上述问题......而不仅仅是Error异常.有某些类型的Java代码试图从意外异常中恢复很可能会以糟糕的方式结束.)

(Note that you could get the above problem for just about any unexpected exception ... not just Error exceptions. There are certain kinds of Java code where attempting to recover from an unexpected exception is likely to end badly.)

这篇关于JVM 能否在不重启的情况下从 OutOfMemoryError 中恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆