Python:如何分割并从函数返回列表以避免内存错误 [英] Python: how to split and return a list from a function to avoid memory error

查看:229
本文介绍了Python:如何分割并从函数返回列表以避免内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用枚举特定数组(一个有向图)中的所有循环的函数,并且我需要它们。这个函数返回所有的循环作为列表的列表(每个子列表是一个循环,例如result = [[0,1,0],[0,1,2,0]]是一个包含2个循环的列表,开始和结束在节点中0)。但是,有数百万个周期,所以对于大图,我得到一个内存错误(MemoryError:MemoryError()),因为包含所有周期的列表过大。



我想这个函数将结果分成几个数组,所以我没有得到内存错误。那可能吗?并会解决这个问题?



我试图通过将结果数组拆分为子结果列表来做到这一点(子结果具有最大尺寸,比如说1000万,低于这里规定的5亿最大规模: Python有多大? Array Get?)。这个想法是,结果是一个包含子结果的列表:result = [sub-result1,sub-result2]。然而,我得到了一个不同的内存错误:没有内存的新分析器。



我这样做的方式如下:

 如果SplitResult == False:
result = []#列表来累计找到的电路
#追加循环到结果列表
if cycle_found():#cycle_found()例如
result.append(new_cycle)
elif SplitResult == True:
result = [[]]#列表来累计找到的电路
#将周期附加到最后的结果子列表
如果cycle_found():#cycle_found()例如
result [len(result)-1] .append(new_cycle)
#当LAST结果大小时创建一个新的子列表
#达到大小限制(ResultSize)
如果len(result [len(result)-1])== ResultSize:
result.append([])

也许问题是,结果列表中的结果。在这种情况下,如何从函数返回可变数量的结果?



特别是,我将12个节点完整有向图的所有简单循环划分为1000万个子列表周期。我知道共有115,443,382个周期,所以我应该列出16个子列表,前15个包含1000万个周期,最后一个包含443,382个周期。而不是我得到一个不同的内存错误:新分析器没有内存。

这个过程适用于11个节点的完整有向图,它返回2个子列表,第一个包含10个百万个周期(10000000),另一个包含976184.如果有任何帮助,它们的内存占用是

 >> ;> sys.getsizeof(cycles_list [0])
40764028
>>> sys.getsizeof(cycles_list [1])$ ​​b $ b 4348732

然后,我想我们应该添加列出的每个周期的大小:

 >>> sys.getsizeof(cycles_list [0] [4])
56
>>> cycles_list [0] [4]
[0,1,2,3,4,0]

感谢您的阅读,

Aleix

解决方案

感谢您的建议。确实,在返回数组时避免内存问题的正确方法就是避免创建如此大的结果数组。因此,生成器函数是前进的方向。



生成器函数在这里很好地解释:什么是收益率关键字在Python中做什么?
我只是想补充一点,正常的函数在你添加一个yield的时刻变成了一个生成器函数。另外,如果添加return语句,iterables的生成将在到达它时结束(某些生成器函数没有return,因此是无限的)。

尽管简单使用生成器我有一些困难时间将原始函数转换为生成器函数,因为它是递归函数(即自己调用)。但是,此条目显示递归生成器函数如何看起来像帮助理解这个递归python函数是如何工作的?,所以我可以将它应用到我的函数中。

再次感谢大家的支持,



Aleix


I am currently working with a function that enumerates all cycles within a specific array (a digraph) and I need them all. This function returns all cycles as a list of lists (each sublist being a cycle, e.g. result=[[0,1,0],[0,1,2,0]] is a list containing 2 cycles starting and ending in node 0). However, there are millions of cycles so for big digraphs I get a memory error (MemoryError: MemoryError()) since the list of lists containing all cycles is too big.

I would like that the function splits the result in several arrays so I do not get the memory error. Is that possible? and would that solve the issue?

I tried to do that by splitting the results array as a list of sub-results (the sub-results have a maximum size, say 10 million which is below the 500 million max size stated here: How Big can a Python Array Get? ). The idea is that the result is a list containing sub-results: result=[sub-result1, sub-result2]. However, I get a different memory error: no mem for new parser.

The way I do that is as follows:

if SplitResult == False:
    result = [] # list to accumulate the circuits found
    # append cycles to the result list
    if cycle_found(): #cycle_found() just for example
        result.append(new_cycle)
elif SplitResult == True:
    result = [[]] # list of lists to accumulate the circuits found
    # append cycles to the LAST result SUB-lists
    if cycle_found(): #cycle_found() just for example
        result[len(result)-1].append(new_cycle)
    # create a new sublist when the size of the LAST result SUB-lists
    # reaches the size limit (ResultSize)       
    if len(result[len(result)-1]) == ResultSize:
        result.append([])

Maybe the issue is that I merge all sub-results within the results list. In that case, how can I return a variable number of results from a function?

In particular I divide all simple cycles of a 12 node complete digraph in sublists of 10 million cycles. I know there are 115,443,382 cycles in total, so I should get a list with 16 sublists, the first 15 containing 10 million cycles each and the last one containing 443,382 cycles. Instead of that I get a different memory error: no mem for new parser.

This procedure works for an 11 node complete digraph which returns 2 sublists, the first containing the 10 million cycles (10000000) and the other containing 976184. In case it is of any help, their memory footprint is

>>> sys.getsizeof(cycles_list[0])
40764028
>>> sys.getsizeof(cycles_list[1])
4348732

Then, I guess we should add the size of each cycle listed:

>>> sys.getsizeof(cycles_list[0][4])
56
>>> cycles_list[0][4]
[0, 1, 2, 3, 4, 0]

Any help will be most welcome,

Thanks for reading,

Aleix

解决方案

Thank you for your suggestions. Indeed the right approach to avoid memory issues when returning arrays is simply by avoiding creating so big result arrays. Thus, generator functions are the way forward.

Generator functions are well explained here: What does the "yield" keyword do in Python? I would just add that a normal function becomes a generator function at the very moment where you add a yield in it. Also, if you add a return statement the generation of iterables will end when reaching it (some generator functions do not have "return" and are thus infinite).

Despite the simple use of generators I had some hard time transforming the original function into a generator function since it was a recursive function (i.e. calling itself). However, this entry shows how a recursive generator function looks like Help understanding how this recursive python function works? and so I could apply it to my function.

Again, thanks to all for your support,

Aleix

这篇关于Python:如何分割并从函数返回列表以避免内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆