除互斥锁或垃圾回收以外,什么机制可以减慢我的多线程Java程序? [英] What mechanisms other than mutexs or garbage collection can slow my multi-threaded java program?

查看:98
本文介绍了除互斥锁或垃圾回收以外,什么机制可以减慢我的多线程Java程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一段Java代码(JDK 1.6.0._22,如果相关的话)实现无状态副作用没有互斥功能的免费功能。然而它使用了大量的内存(我不知道这是否相关)。



过去,我访问了Sun实验室并收集了标准的性能vs线程数曲线。由于这个函数没有互斥,它有一个很好的图表,尽管垃圾收集随着线程数量的增加而被踢入。经过一些垃圾收集调整后,我能够使这条曲线几乎平坦。



我现在在英特尔硬件上做同样的实验。硬件有4个CPU,每个都有8个内核和超线程。这提供了64个availableProcessors()。不幸的是,性能与线程数量的曲线很好地适用于1,2,3线程,并在3线程上限。在3个线程之后,我可以将任意多个线程放入任务中,并且性能也不会更好。



尝试修复问题



我的第一个想法是,我一直很愚蠢,并在某处引入了一些同步代码。通常要解决此问题,我运行JConsole或JVisualVM,并查看线程堆栈跟踪。如果我有64个线程以3的速度运行,我预计其中有61个将等待进入互斥体。我没有找到这个。相反,我发现所有线程正在运行:只是非常缓慢。



第二个想法是,时间框架可能引入了问题。我用一个虚拟函数替换了我的函数,这个虚函数使用AtomicLong计数到了十亿。这与线程数量的精确匹配:我使用64个线程的数量比使用1个线程的数量快64倍。我认为(绝望的踢des在)也许垃圾收集是真的很长时间,所以我调整了垃圾收集参数。虽然这改善了我的延迟变化,但它对吞吐量没有任何影响:我仍然有64个线程以我期望3的速度运行。



我已经下载了intel工具VTunes,但我的技能很弱:这是一个复杂的工具,我还不明白。我有订单说明书:一个有趣的圣诞礼物给我自己,但是这有点太晚了,以帮助我目前的问题



问题


  1. 我可以使用哪些工具(智力或软件)来提高对正在发生的事情的理解?

  2. 除互斥锁或垃圾回收以外,其他什么机制可能会减慢我的代码速度? 解决方案

很多实验后来发现JVM没有什么区别,但我也发现了JDump的强大功能。

  java.lang.Thread.State:RUNNABLE 
at java .util.Random.next(Random.java:189)
at java.util.Random.nextInt(Random.java:239)
at sun.misc.Hashing.randomHashSeed(Hashing.java:254 )在java.util.HashMap中
。< init>(HashMap.java:255)$ java.util.HashMap中的
。< init>(HashMap.java:297)

Random.next看起来像这样

  protected int next(int bits){
long oldseed,nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed =(oldseed * multiplier + addend)&面具;
} while(!seed.compareAndSet(oldseed,nextseed));
return(int)(nextseed>>>(48-bits));
}

最有趣的是,这不是一个明显的锁,所以工具I使用来发现互斥体不起作用。因此,看起来任何创建java hashmaps都会导致应用程序停止扩展(我夸大但不是太多)。我的应用程序确实大量使用hashmaps,所以我想我会重写hashmap或重写应用程序。



我提出了一个单独的问题,以了解如何处理此问题。



感谢您的所有帮助


Problem

I have a piece of java code (JDK 1.6.0._22 if relevant) that implements a stateless, side effect free function with no mutexes. It does however use a lot of memory (I don't know if that is relevant).

In the past I have visited Sun Laboratories and gathered the standard "performance vs number of threads" curve. As this function has no mutexs, it has a nice graph although the garbage collection kicked in as the number of threads increased. After some garbage collection tuning I was able to make this curve almost flat.

I am now doing the same experiment on Intel hardware. The hardware has 4 CPUs each with 8 cores, and hyperthreading. This gives 64 availableProcessors(). Unfortunately the curve of "performance vs number of threads" scales nicely for 1, 2, 3 threads, and caps at 3 threads. After 3 threads I can put as many threads as I want to the task, and the performance gets no better

Attempts to fix the Problem

My first thought was that I had been stupid and introduced some synchronised code somewhere. Normally to resolve this issue I run JConsole or JVisualVM, and look at the thread stacktraces. If I have 64 threads running at the speed of 3, I would expect 61 of them to be sitting waiting to enter a mutex. I didn't find this. Instead I found all the threads running: just very slowly.

A second thought was that perhaps the timing framework was introducing problems. I replaced my function with a dummy function that just counts to a billion using an AtomicLong. This scaled beautifully with number of threads: I was able to count to a billion 10,000 times 64 times quicker with with 64 threads than with 1 thread.

I thought (desperation kicking in) perhaps garbage collection is taking a really really long time, so I tweaked the garbage collection parameters. While this improved my latency variation, it had no effect on throughput: I still have 64 threads running at the speed I expect 3 to run at.

I have downloaded the intel tool VTunes, but my skill with it is weak: it is a complex tool and I don't understand it yet. I have the instruction book on order: a fun Christmas present to myself, but that is a little too late to help my current problem

Question

  1. What tools (mental or software) could I use to improve my understanding of what is going on?
  2. What mechanisms other than mutexs or garbage collection could be slowing my code down?

解决方案

Well many experiments later I discover that the JVM makes no difference, but I also discovered the power of JDump. 50 of the 64 threads were at the following line.

java.lang.Thread.State: RUNNABLE
    at java.util.Random.next(Random.java:189)
    at java.util.Random.nextInt(Random.java:239)
    at sun.misc.Hashing.randomHashSeed(Hashing.java:254)
    at java.util.HashMap.<init>(HashMap.java:255)
    at java.util.HashMap.<init>(HashMap.java:297)

Random.next looks like this

 protected int next(int bits) {
    long oldseed, nextseed;
    AtomicLong seed = this.seed;
    do {
        oldseed = seed.get();
        nextseed = (oldseed * multiplier + addend) & mask;
    } while (!seed.compareAndSet(oldseed, nextseed));
    return (int)(nextseed >>> (48 - bits));
 }

Most interestingly is that this isn't an obvious lock, so the tools I using to spot mutexes weren't working.

So it looks as though any creation of java hashmaps causes applications to stop being scalable (I exaggerate but not much). My application does make heavy use of hashmaps, so I guess I either rewrite hashmap or rewrite the application.

I'm raising a separate question to see how to deal with this.

Thanks for all the help

这篇关于除互斥锁或垃圾回收以外,什么机制可以减慢我的多线程Java程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆