通过多处理减少内存占用? [英] Reducing memory footprint with multiprocessing?

查看:90
本文介绍了通过多处理减少内存占用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的一个应用程序运行约100个工作程序.它最初是threading应用程序,但是遇到了性能(延迟)问题.因此,我将这些工作人员转换为multiprocessing.Process es.下面的基准测试表明,减少负载是通过增加内存使用量(因子6)来实现的.

One of my applications runs about 100 workers. It started out as a threading application, but performance (latency) issues were hit. So I converted those workers to multiprocessing.Processes. The benchmark below shows that the reduction in load was achieved at the cost of more memory usage (factor 6).

那么,如果Linux使用Cow且工作人员不共享任何数据,那么内存使用量到底从何而来?

So where precisely does the memory usage come from if Linux uses cow and the workers do not share any data?

如何减少内存占用量? (另一个问题:如何减少threading的负载?)

How can I reduce the memory footprint? (Alternative question: How can I reduce the load for threading?)

Linux 2.6.26上的基准,4个CPU 2G RAM: (请注意,CPU使用率以一个CPU的百分比表示,因此满负荷为400%.数字是通过查看Munin图得出的.)

Benchmarks on Linux 2.6.26, 4 CPUs 2G RAM: (Note that cpu usage is given in % of one cpu, so full load is 400%. The numbers are derived from looking at Munin graphs.)

                  | threading | multiprocessing
------------------+-----------+----------------
memory usage      | ~0.25GB   | ~1.5GB
context switches  | ~1.5e4/s  | ~5e2/s
system cpu usage  | ~30%      | ~3%
total cpu usage   | ~100%     | ~50%
load avg          | ~1.5      | ~0.7

背景:该应用程序正在处理来自网络的事件,并将其中一些事件存储在MySQL数据库中.

Background: The application is processing events from the network and storing some of them in a MySQL database.

推荐答案

我的理解是,对于动态语言(如Python),写时复制不如写入(并因此复制)更多内存那样有效.分叉后.随着Python解释器在程序中的进行,除了代码之外,还有很多事情要做.例如引用计数-由于引用计数需要将引用计数写入内存(触发副本),因此非常快的对象将被写入.

My understanding is that with dynamic languages, like Python, copy-on-write is not as effective as a lot more memory gets written to (and therefore copied) after forking. As the Python interpretor progresses through the program there's a lot more going on than just your code. For example reference-counting - very object will be written too pretty quickly as reference counting needs to write the reference count to memory (triggering a copy).

考虑到这一点,您可能需要使用混合线程/处理方法.有多个进程可以利用多个内核等,但是每个进程都运行多个线程(因此您可以处理所需的并发级别).您只需要试验运行的线程数和进程数即可.

With that in mind you probably need to have a hybrid threading/processing approach. Have multiple process to take advantage of multiple cores etc, but have each one run multiple threads (so you can deal with the level of concurrency you need). You'll just need to experiment with how many threads vs processes you run.

这篇关于通过多处理减少内存占用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆