为什么我的Opteron内核每个容量只有75%? (25%CPU空闲) [英] Why are my Opteron cores running at only 75% capacity each? (25% CPU idle)

查看:118
本文介绍了为什么我的Opteron内核每个容量只有75%? (25%CPU空闲)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们刚刚接收了功能强大的32核AMD Opteron服务器,配备128Gb。我们有2个6272 CPU,每个CPU有16个核心。我们正在30个线程上运行一个长期运行的大型java任务。我们对Linux和Java进行了NUMA优化。我们的Java线程主要使用该线程专用的对象,有时读取其他线程将读取的内存,并且非常偶尔写入或锁定共享对象。

We've just taken delivery of a powerful 32-core AMD Opteron server with 128Gb. We have 2 x 6272 CPU's with 16 cores each. We are running a big long-running java task on 30 threads. We have the NUMA optimisations for Linux and java turned on. Our Java threads are mainly using objects that are private to that thread, sometimes reading memory that other threads will be reading, and very very occasionally writing or locking shared objects.

我们无法解释为什么CPU核心空闲率为25%。下面是顶部的转储:

We can't explain why the CPU cores are 25% idle. Below is a dump of "top":


top - 23:06:38 up 1 day, 23 min,  3 users,  load average: 10.84, 10.27, 9.62
Tasks: 676 total,   1 running, 675 sleeping,   0 stopped,   0 zombie
Cpu(s): 64.5%us,  1.3%sy,  0.0%ni, 32.9%id,  1.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132138168k total, 131652664k used,   485504k free,    92340k buffers
Swap:  5701624k total,   230252k used,  5471372k free, 13444344k cached
...
top - 22:37:39 up 23:54,  3 users,  load average: 7.83, 8.70, 9.27
Tasks: 678 total,   1 running, 677 sleeping,   0 stopped,   0 zombie
Cpu0  : 75.8%us,  2.0%sy,  0.0%ni, 22.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 77.2%us,  1.3%sy,  0.0%ni, 21.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 77.3%us,  1.0%sy,  0.0%ni, 21.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 77.8%us,  1.0%sy,  0.0%ni, 21.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  : 76.9%us,  2.0%sy,  0.0%ni, 21.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  : 76.3%us,  2.0%sy,  0.0%ni, 21.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  : 12.6%us,  3.0%sy,  0.0%ni, 84.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  8.6%us,  2.0%sy,  0.0%ni, 89.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  : 77.0%us,  2.0%sy,  0.0%ni, 21.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  : 77.0%us,  2.0%sy,  0.0%ni, 21.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 : 77.6%us,  1.7%sy,  0.0%ni, 20.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 : 75.7%us,  2.0%sy,  0.0%ni, 21.4%id,  1.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 : 76.6%us,  2.3%sy,  0.0%ni, 21.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 : 76.6%us,  2.3%sy,  0.0%ni, 21.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 : 76.2%us,  2.6%sy,  0.0%ni, 15.9%id,  5.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 : 76.6%us,  2.0%sy,  0.0%ni, 21.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 : 73.6%us,  2.6%sy,  0.0%ni, 23.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 : 74.5%us,  2.3%sy,  0.0%ni, 23.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 : 73.9%us,  2.3%sy,  0.0%ni, 23.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 : 72.9%us,  2.6%sy,  0.0%ni, 24.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 : 72.8%us,  2.6%sy,  0.0%ni, 24.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu21 : 72.7%us,  2.3%sy,  0.0%ni, 25.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu22 : 72.5%us,  2.6%sy,  0.0%ni, 24.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 : 73.0%us,  2.3%sy,  0.0%ni, 24.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu24 : 74.7%us,  2.7%sy,  0.0%ni, 22.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu25 : 74.5%us,  2.6%sy,  0.0%ni, 22.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu26 : 73.7%us,  2.0%sy,  0.0%ni, 24.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu27 : 74.1%us,  2.3%sy,  0.0%ni, 23.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu28 : 74.1%us,  2.3%sy,  0.0%ni, 23.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu29 : 74.0%us,  2.0%sy,  0.0%ni, 24.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu30 : 73.2%us,  2.3%sy,  0.0%ni, 24.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu31 : 73.1%us,  2.0%sy,  0.0%ni, 24.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132138168k total, 131711704k used,   426464k free,    88336k buffers
Swap:  5701624k total,   229572k used,  5472052k free, 13745596k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13865 root      20   0  122g 112g 3.1g S 2334.3 89.6  20726:49 java
27139 jayen     20   0 15428 1728  952 S  2.6  0.0   0:04.21 top
27161 sysadmin  20   0 15428 1712  940 R  1.0  0.0   0:00.28 top
   33 root      20   0     0    0    0 S  0.3  0.0   0:06.24 ksoftirqd/7
  131 root      20   0     0    0    0 S  0.3  0.0   0:09.52 events/0
 1858 root      20   0     0    0    0 S  0.3  0.0   1:35.14 kondemand/0

A du java堆栈的mp确认没有任何线程位于使用锁的几个地方附近,它们也不在任何磁盘或网络i / o附近。

A dump of the java stack confirms that none of the threads are anywhere near the few places where locks are used, nor are they anywhere near any disk or network i/o.

我很难找到空闲与等待的顶部含义的明确解释,但我得到的印象是空闲意味着不再需要运行的线程,但这没有意义我们的情况。我们使用的是Executors.newFixedThreadPool(30)。有大量待完成的任务,每项任务持续10秒左右。

I had trouble finding a clear explanation of what 'top' means by "idle" versus "wait", but I get the impression that "idle" means "no more threads that need to be run" but this doesn't make sense in our case. We're using a "Executors.newFixedThreadPool(30)". There are a large number of tasks pending and each task lasts for 10 seconds or so.

我怀疑这个解释需要对NUMA有一个很好的理解。当CPU等待非本地访问时,您看到的是空闲状态吗?如果没有,那么解释是什么?

I suspect that the explanation requires a good understanding of NUMA. Is the "idle" state what you see when a CPU is waiting for a non-local access? If not, then what is the explanation?

推荐答案

这可能是一些事情:


  • 可能是线程之间通过共享数据访问进行争用。这可能采取锁争用或由于读或写障碍引起的额外内存流量的形式,尽管后者不太可能产生这些症状。

  • It could be contention between threads over the access to shared data. This might take the form of lock contention, or extra memory traffic due to read or write barriers, though the latter is unlikely to produce these symptoms.

你是泄漏的工人线程;例如他们偶尔会死,不会被替换。

You are leaking worker threads; e.g. they are occasionally dying and not being replaced.

执行者本身可能存在瓶颈;例如它可能没有足够快地响应通过安排下一个任务完成的任务。

There could be a bottleneck is in the executor itself; e.g. it may not be responding quickly enough to tasks finishing by scheduling the next task.

瓶颈可能是垃圾收集器,特别是如果你没有并行已启用收藏。

The bottleneck could be the garbage collector, especially if you don't have parallel collection enabled.

这个页面讨论了Java的NUMA增强功能,并提到了支持NUMA的GC开关。试试吧。另请查看该页面上的其他GC调整建议。

This page talks about Java's NUMA enhancements, and mentions the NUMA-aware GC switch. Try that. Also check out the other GC tuning advice on that page.

此问题解释了流程状态:在linux中,top中的所有值是什么?命令意味着什么?

This question explains the process states: In linux, what do all the values in the "top" command mean?.

我认为处理器摘要中wa和idle时间之间的区别在于wa表示处理器的线程处于D状态;即等待磁盘I / O.相反,所有线程在S状态下等待的处理器将被计为空闲。 (从这个角度看,正在等待锁定的线程将处于S状态。)

I think that the difference between "wa" and "idle" time in the processor summary is that "wa" means that the processor has threads in "D" state; i.e. waiting for disk I/O. By contrast, a processor where all threads are waiting in "S" state would be counted as "idle". (From this perspective, a thread that is waiting on a lock would be in S state.)

你也可以尝试 top -H 单独显示线程。

You could also try top -H which shows the threads individually.

这篇关于为什么我的Opteron内核每个容量只有75%? (25%CPU空闲)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆