如何减少线程化python代码的内存使用量? [英] How to reduce memory usage of threaded python code?

查看:117
本文介绍了如何减少线程化python代码的内存使用量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了大约50个类,这些类用于使用机械化和线程化来连接和使用网站.它们都同时工作,但是彼此不依赖.因此,这意味着1类-1个网站-1个线程.这不是一个特别优雅的解决方案,尤其是对于代码管理而言,因为每个类中都会重复很多代码(但不足以使其成为一个类来传递参数),因为某些站点可能需要在方法中间对检索到的数据进行额外处理-例如登录"-其他人可能不需要).就像我说的那样,它并不优雅-但却行得通.不用说,我欢迎所有建议如何在不为每个网站方法使用1个类的情况下更好地编写此内容.为每个类添加额外的功能或全面的代码管理是一项艰巨的任务.

I wrote about 50 classes that I use to connect and work with websites using mechanize and threading. They all work concurrently, but they don't depend on each other. So that means 1 class - 1 website - 1 thread. It's not particularly elegant solution, especially for managing the code, since lot of the code repeats in each class (but not nearly enough to make it into one class to pass arguments, as some sites may require additional processing of retrieved data in middle of methods - like 'login' - that others might not need). As I said, it's not elegant -- But it works. Needless to say I welcome all recommendations how to write this better without using 1 class for each website approach. Adding additional functionality or overall code management of each class is a daunting task.

但是,我发现,每个线程占用大约8MB的内存,因此在50个正在运行的线程中,我们正在查看大约400MB的使用情况.如果它在我的系统上运行,我不会有任何问题,但是由于它在只有1GB内存的VPS上运行,因此开始成为一个问题.您能告诉我如何减少内存使用吗,或者还有其他方法可以同时使用多个站点吗?

However, I found out, that each thread takes about 8MB memory, so with 50 running threads we are looking at about 400MB usage. If it was running on my system I wouldn't have problem with that, but since it's running on a VPS with only 1GB memory, it's starting to be an issue. Can you tell me how to reduce the memory usage, or are there any other way to to work with multiple sites concurrently?

我使用了这个快速测试python程序来测试存储在我的应用程序变量中的数据是否正在使用内存或其他东西.如下面的代码所示,它仅处理sleep()函数,但每个线程占用8MB的内存.

I used this quick test python program to test if it's the data stored in variables of my application that is using the memory, or something else. As you can see in following code, it's only processing sleep() function, yet each thread is using 8MB of memory.

from thread import start_new_thread
from time import sleep

def sleeper():
    try:
        while 1:
            sleep(10000)
    except:
        if running: raise

def test():
    global running
    n = 0
    running = True
    try:
        while 1:
            start_new_thread(sleeper, ())
            n += 1
            if not (n % 50):
                print n
    except Exception, e:
        running = False
        print 'Exception raised:', e
    print 'Biggest number of threads:', n

if __name__ == '__main__':
    test()

运行此命令时,输出为:

When I run this, the output is:

50
100
150
Exception raised: can't start new thread
Biggest number of threads: 188

然后通过删除running = False行,然后可以在shell中使用free -m命令来测量可用内存:

And by removing running = False line, I can then measure free memory using free -m command in shell:

             total       used       free     shared    buffers     cached
Mem:          1536       1533          2          0          0          0
-/+ buffers/cache:       1533          2
Swap:            0          0          0

通过除以上述测试应用程序运行之前和运行期间所用内存的差值除以其设法启动的最大线程数,就可以很容易地得出我为什么知道每个线程占用约8MB的实际计算.

The actual calculation why I know it's taking about 8MB per thread is then simple by dividing dividing the difference of memory used before and during the the above test application is running, divided by maximum threads it managed to start.

可能只是分配的内存,因为通过查看top,python进程仅使用约0.6%的内存.

It's probably only allocated memory, because by looking at top, the python process uses only about 0.6% of memory.

推荐答案

  • 使用更少的线程: ThreadPoolExecutor示例.在Python 2.x上安装 futures
  • 尝试异步方法:
    • 事件示例
    • 扭曲的示例
      • use less threads: ThreadPoolExecutor example. Install futures on Python 2.x
      • try asynchronous approach:
        • gevent example
        • twisted example
        • 这篇关于如何减少线程化python代码的内存使用量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆