在此for循环中占用什么内存? [英] What is taking up the memory in this for-loop?

查看:530
本文介绍了在此for循环中占用什么内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在玩memory_profiler软件包(从pip下载),更具体地说,研究通过首先创建一个临时列表与遍历迭代器列表"来遍历列表的内存效率.

I was playing around with the memory_profiler package (downloaded from pip), more specifically, looking at the memory efficiency of looping through a list by creating a temporary list first vs. looping through an "iterator list".

这是我前一段时间遇到的一个问题,我想对我的解决方案进行基准测试.问题是,我需要将列表中的每个元素与同一列表中的下一个元素进行比较,直到所有元素都被处理"为止.因此,我想这将是一个O(n ^ 2)解决方案(如果选择了最幼稚的解决方案,则对于列表中的每个元素,都将遍历列表).

This was a problem that I encountered a while back and I wanted to benchmark my solution. The problem was that I needed to compare each element in a list with the next element in the same list, until all elements had been "dealt with". So I guess this would be an O(n^2) solution (if the most naive solution is picked, for each element in list, loop through list).

无论如何,下面的三个功能都在做相同的事情(或多或少);循环遍历一个自身偏移量为1的列表.

Anyways, the three functions below are all doing the same thing (more or less); looping over a list that is zipped with itself-offset-by-one.

import cProfile

@profile
def zips():
    li = range(1,20000000)
    for tup in zip(li,li[1:]):
        pass
    del li

@profile
def izips():
    from itertools import izip
    li = range(1,20000000)
    for tup in izip(li,li[1:]):
        pass
    del li

@profile
def izips2():
    from itertools import izip
    li = range(1,20000000)
    for tup in izip(li,li[1:]):
        del tup
    del li



if __name__ == '__main__':
    zips()
    # izips()
    # izips2()

(对我来说)令人惊讶的部分是内存使用情况,首先我运行zips()函数,尽管我认为自己确实做了清理,但最终还是获得了约1.5 GB的内存:

The surprising part (to me) was in the memory usage, first I run the zips() function, and although I thought I did clean up, I still ended up with ~1.5 GB in memory:

ipython -m memory_profiler python_profiling.py 
Filename: python_profiling.py

Line #    Mem usage    Increment   Line Contents
================================================
    10                             @profile
    11    27.730 MB     0.000 MB   def zips():
    12   649.301 MB   621.570 MB    li = range(1,20000000)
    13  3257.605 MB  2608.305 MB    for tup in zip(li,li[1:]):
    14  1702.504 MB -1555.102 MB        pass
    15  1549.914 MB  -152.590 MB    del li

然后我关闭解释器实例并重新打开它以运行下一个测试,即izips()函数:

Then I close the interpreter instance and reopen it for running the next test, which is the izips() function:

ipython -m memory_profiler python_profiling.py 
Filename: python_profiling.py

Line #    Mem usage    Increment   Line Contents
================================================
    17                             @profile
    18    27.449 MB     0.000 MB   def izips():
    19    27.449 MB     0.000 MB    from itertools import izip
    20   649.051 MB   621.602 MB    li = range(1,20000000)
    21  1899.512 MB  1250.461 MB    for tup in izip(li,li[1:]):
    22  1746.922 MB  -152.590 MB        pass
    23  1594.332 MB  -152.590 MB    del li

然后,最后我运行了一个测试(同样在重新启动两者之间的解释后),在该测试中,我尝试显式删除for循环中的元组,以确保释放其内存(也许我不是在想)正确吗?).事实证明这并没有什么不同,所以我猜我不是在提示GC还是不是内存开销的来源.

And then finally I ran a test (again after restarting the interpreter in between) where I tried to explicitly delete the tuple in the for-loop to try to make sure that its memory would be freed (maybe I'm not thinking that correctly?). Turns out that didn't make a difference so I'm guessing that either I'm not prompting GC or that is not the source of my memory overhead.

ipython -m memory_profiler python_profiling.py 
Filename: python_profiling.py

Line #    Mem usage    Increment   Line Contents
================================================
    25                             @profile
    26    20.109 MB     0.000 MB   def izips2():
    27    20.109 MB     0.000 MB    from itertools import izip
    28   641.676 MB   621.566 MB    li = range(1,20000000)
    29  1816.953 MB  1175.277 MB    for tup in izip(li,li[1:]):
    30  1664.387 MB  -152.566 MB        del tup
    31  1511.797 MB  -152.590 MB    del li

底线: 我以为for循环本身的开销是最小的,因此,我期待的只是约620.000 MB(存储列表所需要的内存)多一点,但是看起来我有大约2个大小的列表20.000.000内存+更大的开销.谁能帮我解释一下所有这些内存是用来做什么的? (并且每次运行结束时会占用约1.5 GB的空间?)

Bottom line: I thought that the overhead of the for loop itself was minimal, and therefore, I was expecting just a little bit more than ~620.000 MB (the memory it takes to store the list) but instead it looks like I have ~2 lists of size 20.000.000 in memory + even more overhead. Can anyone help me explain what all this memory is being used for?? (and what is taking up that ~1.5 GB at the end of each run?)

推荐答案

请注意,操作系统会按块分配内存,并不一定要一次性回收所有内存.我发现内存配置文件包非常不准确,因为它似乎没有考虑到这一点.

Note that the OS assigns memory in chunks, and doesn't necessarily reclaim it all in one go. I've found the memory profiling package to be wildly inaccurate because it appears it fails to take that into account.

您的li[1:]切片将创建一个包含(2 * 10 ** 7)-1个元素的 new 列表,几乎是一个完整的新副本,从而轻松地使列表所需的存储空间增加一倍. zip()调用还返回一个完整的新列表对象,即压缩操作的输出,再次需要存储中间结果,再加上2000万个2元素元组.

Your li[1:] slice creates a new list with (2*10**7) - 1 elements, nearly a whole new copy, easily doubling the memory space required for the lists. The zip() call also returns a full new list object, the output of the zipping action, again requiring memory for the intermediary result, plus 20 million 2-element tuples.

您可以使用新的迭代器来代替切片:

You could use a new iterator instead of slicing:

def zips():
    from itertools import izip
    li = range(1,20000000)
    next_li = iter(li)
    next(next_li)  # advance one step
    for tup in izip(li, next_li):
        pass
    del li

iter()调用返回的列表迭代器轻巧得多.它仅保留对原始列表的引用和指针.将此内容与izip()结合使用也可以避免创建输出列表.

The list iterator returned from the iter() call is much more light-weight; it only keeps a reference to the original list and a pointer. Combining this with izip() avoids creating the output list as well.

这篇关于在此for循环中占用什么内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆