Python 内存占用与堆大小 [英] Python memory footprint vs. heap size

查看:40
本文介绍了Python 内存占用与堆大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 python 脚本发出大型 solr 查询时遇到了一些内存问题.我正在使用 solrpy 库与 solr 服务器进行交互.该查询返回大约 80,000 条记录.立即发出查询后,通过顶部气球查看的 python 内存占用为 ~190MB.

I'm having some memory issues while using a python script to issue a large solr query. I'm using the solrpy library to interface with the solr server. The query returns approximately 80,000 records. Immediately after issuing the query the python memory footprint as viewed through top balloons to ~190MB.

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
8225 root      16   0  193m 189m 3272 S  0.0 11.2   0:11.31 python
...

此时,通过 heapy 查看的堆配置文件如下所示:

At this point, the heap profile as viewed through heapy looks like this:

Partition of a set of 163934 objects. Total size = 14157888 bytes.   
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  80472  49  7401384  52   7401384  52 unicode
     1  44923  27  3315928  23  10717312  76 str
...

unicode 对象代表查询记录的唯一标识符.需要注意的一件事是,总堆大小只有 14MB,而 python 占用了 190MB 的物理内存.一旦存储查询结果的变量超出范围,堆配置文件就会正确反映垃圾收集:

The unicode objects represent the unique identifiers of the records from the query. One thing to note is that the total heap size is only 14MB while python is occupying 190MB of physical memory. Once the variable storing the query results falls out of scope, the heap profile correctly reflects the garbage collection:

Partition of a set of 83586 objects. Total size = 6437744 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  44928  54  3316108  52   3316108  52 str

但是,内存占用保持不变:

However, the memory footprint remains unchanged:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8225 root      16   0  195m 192m 3432 S  0.0 11.3   0:13.46 python
...

为什么python的物理内存占用和python堆的大小差距这么大?

Why is there such a large disparity between python's physical memory footprint and the size of the python heap?

推荐答案

Python 从 C 堆分配 Unicode 对象.因此,当您分配其中的许多(以及其他 malloc 块),然后释放除最后一个之外的大部分时,C malloc 不会向操作系统返回任何内存,因为 C 堆只会在最后收缩(不在中间).释放最后一个 Unicode 对象将释放 C 堆末尾的块,然后允许 malloc 将其全部返回给系统.

Python allocates Unicode objects from the C heap. So when you allocate many of them (along with other malloc blocks), then release most of them except for the very last one, C malloc will not return any memory to the operating system, as the C heap will only shrink on the end (not in the middle). Releasing the last Unicode object will release the block at the end of the C heap, which then allows malloc to return it all to the system.

除了这些问题之外,Python 还维护了一个已释放的 unicode 对象池,以实现更快的分配.所以当最后一个 Unicode 对象被释放时,它不会立即返回到 malloc,从而导致所有其他页面卡住.

On top of these problems, Python also maintains a pool of freed unicode objects, for faster allocation. So when the last Unicode object is freed, it isn't returned to malloc right away, making all the other pages stuck.

这篇关于Python 内存占用与堆大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆