Object.wait()超过了超时 [英] Object.wait() exceeds timeout

查看:200
本文介绍了Object.wait()超过了超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么可以解释 Object.wait(超时)的持续时间超过提供的超时值?

What could explain that the duration of Object.wait(timeout) exceeds the provided timeout value?

long start = System.currentTimeMillis();
obj.wait(1000);
long duration = System.currentTimeMillis() - start;
// sometimes (very rarely) duration may exceed 1500

上下文:某处在一个非常复杂的软件的深度,有一段代码使得这样的等待并在持续时间过长的情况下生成警告日志。在生产环境中,流量很大,一些日志会报告巨大的等待(例如30秒)。因此,我正在尝试重现它,了解可能发生的事情以及如何修复/改进它。

Context: Somewhere in the depth of a very complex software there is a piece of code that makes such a wait and generates warning logs in case of excessive durations. In a production environment, with high traffic, some logs report huge overwaits (like 30 seconds). So I'm trying to reproduce it, understand what may happen and how to fix/improve it.

推荐答案

通过等待(超时)调用所花费的用户时间或挂钟时间通常是超时值加上线程重新安排执行和执行之前的时间。

The "user-time" or "wall-clock-time" spent with the "wait(timeout)" call is, ususally, the timeout value plus the time until the thread was re-scheduled for execution and executed.

请参阅Javadoc获取 Object.wait(长超时)方法

See the Javadoc for the Object.wait(long timeout) method:


线程T则为[。 ..]重新启用线程调度。然后它以通常的方式与其他线程竞争,以便在对象上同步;

The thread T is then [...] re-enabled for thread scheduling. It then competes in the usual manner with other threads for the right to synchronize on the object;

因此无法保证真实 -time操作,它更像是一种最佳尝试,取决于当前的系统负载,也可能取决于应用程序中的其他锁定依赖项。因此,如果系统负载过重,或者您的应用程序处理多个线程,则等待可能需要比超时时间长得多。

So there is no guarantee for "real-time" operation, it's more a kind of "best try", depending on current system load and maybe also on other locking dependencies in your application. Therefore, if the system is under heavy load, or your application handles many threads, the wait might take considerably longer than the timeout.

PS

@ nathan-hughes在他对你的问题的评论中提到的引用可能是等待方法的Javadoc中的关键句子:指定的实时数量已经过去,或多或少

PPS

根据您的问题编辑并附加上下文信息('非常复杂的软件','高流量','巨大的等待'):你必须找到你的 obj 对象的所有用法作为锁定,并确定这些用法的用途互动。

PPS
Based on your question edit with additional context information ('very complex software', 'high traffic', 'huge overwaits'): you have to find all usages of your obj object as a lock, and determine how those usages interact together.

这可能变得非常复杂。这里有一个尝试来描绘可能出错的简单场景,只有两个普通线程,例如这个:

This can get really complex. Here an attempt to sketch a "simple" scenario of what might go wrong, whith only two plain threads, like e.g. this:

// thread 1
synchronized (obj) {
    // wait 1000ms
    obj.wait(1000);
}
// check for overwait

// thread 2, after, let's say 500 ms
synchronized (obj) {
    obj.notify();
}

简单方案,一切都很好,执行顺序粗略:

Easy scenario, everything is fine, the execution order is roughly:


  1. 0ms:T1获取锁定'obj'

  2. 0ms:T1将自己注册为等待'obj' ',并从线程调度中排除。 从线程调度中排除,obj的锁定再次被释放(!)

  3. 500ms:T2获取obj上的锁定,通知一个线程等待通知(根据线程调度设置选择线程),并释放'obj'上的锁

  4. 500ms + X:重新启用T1以进行线程调度,它等待直到它重新获取'obj'(!)上的锁,然后它完成了它的阻止并释放'obj'上的锁。

  1. 0ms: T1 aquires the lock on 'obj'
  2. 0ms: T1 registers itself as waiting for 'obj', and gets excluded from thread scheduling. While excluded from thread scheduling, the lock on 'obj' is again released (!)
  3. 500ms: T2 aquires the lock on 'obj', notifies one thread waiting for notification (thread is chosen based on thread scheduling settings), and releases the lock on 'obj'
  4. 500ms + X: T1 is re-enabled for thread scheduling, it waits until it re-aquires the lock on 'obj' (!), then it finished it's block and releases the lock on 'obj'.

这些只是2个简单的线程和 synchronized 块。让这个更复杂,代码写得不好。如果第二个线程会是这样的:

These are only 2 simple threads and synchronized blocks. Let's make this more complex, with poorly written code. What if 2nd thread would be something like that:

// bad variant of thread 2, after, let's say 500 ms
synchronized (obj) {
    obj.notify();

    // do complex operation, taking more than few ms,
    // maybe a heavy SQL query/update...
}

在这种情况下,即使T1已收到通知(或可能超时),也必须等到它再次获得锁定在'obj'上,只要复杂的操作运行(上一个列表中的第3步),它仍然由T2持有!这可能确实需要......秒或更长时间。

In this case, even though T1 has got notified (or maybe timed out), it has to wait until it gains again the lock on 'obj', which is still held by T2 as long as the complex operation runs (step 3 in the previous list)! This might indeed take up to ... seconds or more.

更复杂:我们返回初始的简单线程T1和T2,但添加第3个线程:

Even more complexity: we return to our initial simple threads T1 and T2, but add a 3rd thread:

// thread 3, after, let's say also 500 ms
synchronized (obj) {
    // do complex operation, taking more than few ms,
    // maybe a heavy SQL query/update...
}

执行顺序可能大致为:


  1. 0ms:T1获取'obj'上的锁定

  2. 0ms:T1将自身注册为等待'obj',并从线程调度中排除。 从线程调度中排除,obj的锁定再次被释放(!)

  3. 500ms:T2获取obj上的锁定,通知一个线程等待通知(根据线程调度设置选择线程),并释放'obj'上的锁

  4. 500ms + X:为线程调度重新启用T2,但不会锁定'obj',因为

  5. 500ms + X:T3由线程调度程序在T1之前调度,并且获取锁定'obj' '(!),并开始执行复杂的操作。除了等待,T1不能做任何事情!

  6. 500ms + MANY:T3 *释放'obj'的锁定。

  7. 500ms + MANY:T1 重新获取'obj'(!)上的锁定,然后退出其同步的块并释放锁定'obj'。

  1. 0ms: T1 aquires the lock on 'obj'
  2. 0ms: T1 registers itself as waiting for 'obj', and gets excluded from thread scheduling. While excluded from thread scheduling, the lock on 'obj' is again released (!)
  3. 500ms: T2 aquires the lock on 'obj', notifies one thread waiting for notification (thread is chosen based on thread scheduling settings), and releases the lock on 'obj'
  4. 500ms + X: T2 is re-enabled for thread scheduling, but does not get the lock on 'obj', because
  5. 500ms + X: T3 is scheduled by thread scheduler before T1, and it aquires the lock on 'obj' (!), and starts doing it's complex operation. T1 can't do anything but wait!
  6. 500ms + MANY: T3 *releases the lock on 'obj'.
  7. 500ms + MANY: T1 re-aquires the lock on 'obj' (!), then exits its synchonized block and releases itself the lock on 'obj'.

这只是在高流量的非常复杂的软件中可能发生的事情。添加更多线程,可能编码不好(例如,在'同步'块中做太多),流量很大,你可能很容易得到你提到的等待。

This is only scratching the surface of what might happen in your 'very complex software', with 'high traffic'. Add more threads, maybe poorly coded (e.g. doing too much in the 'synchronized' blocks), high traffic, and you might easily get the overwaits you mentioned.

OPTIONS

如何解决这个问题......取决于软件的目的和复杂性,没有简单的计划。根据可用的信息不能说更多。

OPTIONS
How to solve this... depends on the purpose and complexity of your software, there is no simple plan. More can't be said based on the available information.

也许用笔和纸重新分析代码就足够了,也许分析它可以帮助你找到锁,也许你可以通过JMX或线程转储(通过信号,jconsole,jcmd,jvisualvm)获得有关当前锁的所需信息,或通过Java Mission Control和Java Flight Recording监控(自从... JDK 7u40以来可用的功能) 。

Maybe reanalysing the code with pen and paper is enough, maybe profiling it could help you find the locks, maybe you can get the needed information about the current locks via JMX or a thread dump (via signal, jconsole, jcmd, jvisualvm), or by monitoring with the Java Mission Control and Java Flight Recording (features available since ... JDK 7u40 I think).

你在评论中询问 Thread.sleep(超时)会有所帮助:不能没有更多信息说。也许它会有所帮助。或者可能是重入锁或其他锁定选项(请参阅包 java.util.concurrent java.util.concurrent.atomic java.util.concurrent.locks )会更合适。这取决于您的代码,用例以及您正在使用的Java版本。

You've asked in a comment if Thread.sleep(timeout) would help: can't be said without more info. Maybe it would help. Or maybe reentrant locks, or other locking options (see packages java.util.concurrent, java.util.concurrent.atomic, java.util.concurrent.locks) would be more appropriate. It depends on your code, your use case and on the Java version you're using.

如果GC不是问题(见下文),并且您已经分析了代码,它看起来很好,你认为高流量是原因,你也可以考虑启用偏置锁定或/和旋转锁定。有关更多详细信息,请参阅 Java 7 JVM选项(文章包含指向Java 8 JVM选项的链接。)

If GC is not an issue (see below), and you have analyzed the code, it "looks fine", and you think the high traffic is the cause, you might also consider enabling biased locking or/and spin locking. See the Java 7 JVM options for more details (article contains links to Java 8 JVM options too).

垃圾收集


顺便说一句,高流量应该是让我先问过这个问题:垃圾收集,你有没有监控它?如果没有正确配置/调整,GC也可能经常导致非常显着的暂停! (我本周有这样的案例,完整的GC需要15-30秒......)

GARBAGE COLLECTION
By the way, 'high traffic' should have made me ask this earlier: the garbage collection, have you monitored it? If not properly configured/tuned, GC might also often cause very significant pauses! (I had this week such a case, 15-30 seconds for full GC...)

这篇关于Object.wait()超过了超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆