具有数千个线程的内存设置 [英] Memory settings with thousands of threads

查看:95
本文介绍了具有数千个线程的内存设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Linux机箱(AMD 6 Core,16 GB RAM)上使用JVM(Oracle 1.7 64位),以了解应用程序中的线程数如何影响性能。我希望测量上下文切换会降低性能。

I'm playing around with the JVM (Oracle 1.7 64 bit) on a Linux box (AMD 6 Core, 16 GB RAM) to see how the number of threads in an application affects performance. I'm hoping to measure at which point context switching degrades performance.

我创建了一个创建线程执行池的小应用程序:

I have created a little application that creates a thread execution pool:

Executors.newFixedThreadPool(numThreads)

我每次运行程序时都会调整 numThreads ,以查看它的效果。

I adjust numThreads everytime I run the program, to see the effect it has.

然后我提交 numThread 池中的作业( java.util.concurrent.Callable 的实例)。每一个增加一个 AtomicInteger ,做一些工作(创建一个随机整数数组并将其改组),然后休息一会儿。想法是模拟Web服务调用。最后,作业重新提交到池中,这样我总是有 numThreads 工作。

I then submit numThread jobs (instances of java.util.concurrent.Callable) to the pool. Each one increments an AtomicInteger, does some work (creates an array of random integers and shuffles it), and then sleeps a while. The idea is to simulate a web service call. Finally, the job resubmits itself to the pool, so that I always have numThreads jobs working.

我正在测量吞吐量,如每分钟处理的作业数量。

I am measuring the throughput, as in the number of jobs that are processed per minute.

有几千个线程,我每分钟可以处理多达400,000个作业。超过8000个线程,结果开始变化很大,这表明上下文切换正成为一个问题。但我可以继续将线程数增加到30,000,并且仍然可以获得更高的吞吐量(每分钟420,000到570,000个作业)。

With several thousand threads, I can process up to 400,000 jobs a minute. Above 8000 threads, the results start to vary a lot, suggesting that context switching is becoming a problem. But I can continue to increase the number of threads to 30,000 and still get higher throughput (between 420,000 and 570,000 jobs per minute).

现在的问题是:我得到了一个 java.lang.OutOfMemoryError:无法创建具有超过约31,000个作业的新本机线程。我试过设置 -Xmx6000M 这没有帮助。我试过玩 -Xss ,但这也无济于事。

Now the question: I get a java.lang.OutOfMemoryError: Unable to create new native thread with more than about 31,000 jobs. I have tried setting -Xmx6000M which doesn't help. I tried playing with -Xss but that doesn't help either.

我读过 ulimit 可能很有用,但随着 ulimit -u 64000 的增加没有改变任何东西。

I've read that ulimit can be useful, but increasing with ulimit -u 64000 didn't change anything.

有关信息:

[root@apollo ant]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127557
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

所以问题# 1:我需要做些什么才能创建更大的线程池?

So the question #1: What do I have to do to be able to create a bigger thread pool?

问题2:我应该在什么阶段看到上下文切换真的会降低吞吐量并导致该过程停止?

Question #2: At what stage should I expect to see context switching really reducing throughput and causing the process to grind to a halt?

以下是一些结果,在我修改它以做多一点之后处理(如建议的那样)并开始记录平均响应时间(如同建议的那样)。

Here are some results, after I modified it to do a little more processing (as was suggested) and started recording average response times (as was also suggested).

// ( (n_cores x t_request) / (t_request - t_wait) ) + 1
// 300 ms wait, 10ms work, roughly 310ms per job => ideal response time, 310ms
// ideal num threads = 1860 / 10 + 1 = 187 threads
//
// results:
//
//   100 =>  19,000 thruput,  312ms response, cpu < 50%
//   150 =>  28,500 thruput,  314ms response, cpu 50%
//   180 =>  34,000 thruput,  318ms response, cpu 60%
//   190 =>  35,800 thruput,  317ms response, cpu 65%
//   200 =>  37,800 thruput,  319ms response, cpu 70%
//   230 =>  42,900 thruput,  321ms response, cpu 80%
//   270 =>  50,000 thruput,  324ms response, cpu 80%
//   350 =>  64,000 thruput,  329ms response, cpu 90%
//   400 =>  72,000 thruput,  335ms response, cpu >90%
//   500 =>  87,500 thruput,  343ms response, cpu >95%
//   700 => 100,000 thruput,  430ms response, cpu >99%
//  1000 => 100,000 thruput,  600ms response, cpu >99%
//  2000 => 105,000 thruput, 1100ms response, cpu >99%
//  5000 => 131,000 thruput, 1600ms response, cpu >99%
// 10000 => 131,000 thruput, 2700ms response, cpu >99%,  16GB Virtual size
// 20000 => 140,000 thruput, 4000ms response, cpu >99%,  27GB Virtual size
// 30000 => 133,000 thruput, 2800ms response, cpu >99%,  37GB Virtual size
// 40000 =>       - thruput,    -ms response, cpu >99%, >39GB Virtual size => java.lang.OutOfMemoryError: unable to create new native thread

我将它们解释为:

1)即使应用程序在96.7%的时间内处于休眠状态,仍然需要进行大量的线程切换
2)上下文切换是可测量的,并显示在响应时间。

1) Even though the application is sleeping for 96.7% of the time, that still leaves lots of thread switching to be done 2) Context switching is measurable, and is shown in the response time.

这里有趣的是,在调整应用程序时,您可能会选择可接受的响应时间,比如说400毫秒,并增加线程数,直到你得到那个响应时间,在这种情况下会让应用程序每分钟处理大约95,000个请求。

What is interesting here is that When tuning an app, you'd might choose the acceptable response time, say 400ms, and increase number of threads until you get that response time, which in this case would let the app process around 95 thousand requests a minute.

通常人们说理想的线程数接近数字核心。在具有等待时间的应用程序中(阻塞的线程,比如等待数据库或Web服务响应),计算需要考虑(参见上面的等式)。但是,当您查看结果或调整到特定的响应时间时,即使理论上的理想也不是真正的理想。

Often people say that the ideal number of threads is near the number of cores. In apps that have wait time (blocked threads, say waiting for a database or web service to respond), the calculation needs to consider that (see my equation above). But even that theoretical ideal isn't an actual ideal, when you look at the results or when you tune to a specific response time.

推荐答案


我得到一个java.lang.OutOfMemoryError:无法创建超过31,000个作业的新本机线程。我试过设置-Xmx6000M这没有帮助。我尝试使用-Xss但这也无济于事。

I get a java.lang.OutOfMemoryError: Unable to create new native thread with more than about 31,000 jobs. I have tried setting -Xmx6000M which doesn't help. I tried playing with -Xss but that doesn't help either.

-Xmx设置无效,因为线程堆栈不是从堆中分配。

The -Xmx setting won't help because thread stacks are not allocated from the heap.

发生的事情是,JVM要求操作系统提供一个内存段(堆外!)来保存堆栈,以及操作系统是拒绝请求。最可能的原因是ulimit或OS内存资源问题:

What is happening is that the JVM is asking the OS for a memory segment (outside of the heap!) to hold the stack, and the OS is refusing the request. The most likely reasons for this are a ulimit or an OS memory resource issue:


  • 数据段大小ulimit,是无限制,所以这应该不是问题。

  • The "data seg size" ulimit, is unlimited, so that shouldn't be the problem.

这样就留下了内存资源。每次1Mb的30,000个线程约为30Gb,这比你拥有的物理内存要多得多。我的猜测是有足够的交换空间用于30Gb的虚拟内存,但是你已经将边界推得太远了。

So that leaves memory resources. 30,000 threads at 1Mb a time is ~30Gb which is a lot more physical memory than you have. My guess is that there is enough swap space for 30Gb of virtual memory, but you have pushed the boundary just a bit too far.

-Xss设置应有帮助,但您需要使请求的堆栈大小小于默认大小 1m 。除此之外还有一个很难的最小尺寸。

The -Xss setting should help, but you need to make the requested stack size LESS than the default size of 1m. And besides there is a hard minimum size.


问题#1:我需要做些什么才能创建更大的线程池?

Question #1: What do I have to do to be able to create a bigger thread pool?

将默认堆栈大小减小到目前的水平以下,或者增加可用虚拟内存量。 (不建议使用后者,因为看起来你已经严重过度分配了。)

Decrease the default stack size below what it currently is, or increase the amount of available virtual memory. (The latter is NOT recommended since it looks like you are already seriously over-allocating already.)


问题2:应该在什么阶段我希望看到上下文切换真的会降低吞吐量并导致进程停止?

Question #2: At what stage should I expect to see context switching really reducing throughput and causing the process to grind to a halt?

无法预测。它将高度依赖于线程实际执行的操作。事实上,我不认为你的基准测试会给你答案,告诉你一个真正的多线程应用程序将如何表现。

It is not possible to predict that. It will be highly dependent on what the threads are actually doing. And indeed, I don't think that your benchmarking is going to give you answers that will tell you how a real multi-threaded application is going to behave.

Oracle网站上说这个关于线程堆栈空间的主题:

The Oracle site says this on the topic of thread stackspace:


在Java SE 6中,Sparc的默认值在32位VM中为512k,而1024k在64位VM中。在x86 Solaris / Linux上,它在32位VM中为320k,在64位VM中为1024k。

In Java SE 6, the default on Sparc is 512k in the 32-bit VM, and 1024k in the 64-bit VM. On x86 Solaris/Linux it is 320k in the 32-bit VM and 1024k in the 64-bit VM.

在Windows上,从中读取默认的线程堆栈大小二进制(java.exe)。从Java SE 6开始,这个值在32位VM中为320k,在64位VM中为1024k。

On Windows, the default thread stack size is read from the binary (java.exe). As of Java SE 6, this value is 320k in the 32-bit VM and 1024k in the 64-bit VM.

您可以通过运行来减少堆栈大小-Xss选项。例如:

You can reduce your stack size by running with the -Xss option. For example:



  java -server -Xss64k




请注意,在某些版本的Windows上,操作系统可能会使用非常粗略的粒度来舍入线程堆栈大小。如果请求的大小小于默认大小1K或更多,则堆栈大小向上舍入为默认值;否则,堆栈大小向上舍入为1 MB的倍数。

Note that on some versions of Windows, the OS may round up thread stack sizes using very coarse granularity. If the requested size is less than the default size by 1K or more, the stack size is rounded up to the default; otherwise, the stack size is rounded up to a multiple of 1 MB.

64k是每个线程允许的最小堆栈空间量。

64k is the least amount of stack space allowed per thread.

这篇关于具有数千个线程的内存设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆