通过删除numpy数组释放内存 [英] Free up memory by deleting numpy arrays

查看:225
本文介绍了通过删除numpy数组释放内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用GUI编写了疲劳分析程序.该程序获取有限元模型中每个单元的单位载荷应变信息,使用np.genfromtxt('loadcasefilename.txt')读取载荷工况,然后进行一些疲劳分析,并将每个单元的结果保存在另一个数组中.

I have written a fatigue analysis program with a GUI. The program takes strain information for unit loads for each element of a finite element model, reads in a load case using np.genfromtxt('loadcasefilename.txt') and then does some fatigue analysis and saves the result for each element in another array.

作为文本文件的负载情况大约为32Mb,大约40个左右的负载情况将被循环读取和分析.通过获取工况矩阵数组的切片来内插每个元素的载荷.

The load cases are about 32Mb as text files and there are 40 or so which get read and analysed in a loop. The loads for each element are interpolated by taking slices of the load case array.

GUI和疲劳分析在单独的线程中运行.在疲劳分析上单击开始"时,它将在疲劳分析中的载荷工况上开始循环.

The GUI and fatigue analysis run in separate threads. When you click 'Start' on the fatigue analysis it starts the loop over the load cases in the fatigue analysis.

这使我陷入困境.如果我有很多内容,分析将无法完成.它退出的早期时间取决于有多少元素,这让我认为这可能是内存问题.我试图通过在每个循环的末尾删除工况数组来解决此问题(删除所有作为其切片的数组之后)并运行gc.collect(),但这没有取得任何成功.

This brings me onto my problem. If I have a lot of elements, the analysis will not finish. How early it quits depends on how many elements there are, which makes me think it might be a memory problem. I've tried fixing this by deleting the load case array at the end of each loop (after deleting all the arrays which are slices of it) and running gc.collect() but this has not had any success.

在MatLab中,我将使用'pack'函数将工作区写入磁盘,清除它,然后在每个循环结束时重新加载它.我知道这不是好习惯,但可以完成工作!我可以以某种方式在Python中完成等效操作吗?

In MatLab, I'd use the 'pack' function to write the workspace to disk, clear it, and then reload it at the end of each loop. I know this isn't good practice but it would get the job done! Can I do the equivalent in Python somehow?

以下代码:

for LoadCaseNo in range(len(LoadCases[0]['LoadCaseLoops'])):#range(1):#xxx
    #Get load case data
    self.statustext.emit('Opening current load case file...')
    LoadCaseFilePath=LoadCases[0]['LoadCasePaths'][LoadCaseNo][0]
    #TK: load case paths may be different
    try:
      with open(LoadCaseFilePath):
        pass
    except Exception as e:
        self.statustext.emit(str(e))


    LoadCaseLoops=LoadCases[0]['LoadCaseLoops'][LoadCaseNo,0]
    LoadCase=np.genfromtxt(LoadCaseFilePath,delimiter=',')

    LoadCaseArray=np.array(LoadCases[0]['LoadCaseLoops'])
    LoadCaseArray=LoadCaseArray/np.sum(LoadCaseArray,axis=0)
    #Loop through sections
    for SectionNo in  range(len(Sections)):#range(100):#xxx 
        SectionCount=len(Sections)
        #Get section data
        Elements=Sections[SectionNo]['elements']
        UnitStrains=Sections[SectionNo]['strains'][:,1:]
        Nodes=Sections[SectionNo]['nodes']
        rootdist=Sections[SectionNo]['rootdist']
        #Interpolate load case data at this section
        NeighbourFind=rootdist-np.reshape(LoadCase[0,1:],(1,-1))
        NeighbourFind[NeighbourFind<0]=1e100
        nearest=np.unravel_index(NeighbourFind.argmin(), NeighbourFind.shape)
        nearestcol=int(nearest[1])
        Distance0=LoadCase[0,nearestcol+1]
        Distance1=LoadCase[0,nearestcol+7]
        MxLow=LoadCase[1:,nearestcol+1]
        MxHigh=LoadCase[1:,nearestcol+7]
        MyLow=LoadCase[1:,nearestcol+2]
        MyHigh=LoadCase[1:,nearestcol+8]
        MzLow=LoadCase[1:,nearestcol+3]
        MzHigh=LoadCase[1:,nearestcol+9]
        FxLow=LoadCase[1:,nearestcol+4]
        FxHigh=LoadCase[1:,nearestcol+10]
        FyLow=LoadCase[1:,nearestcol+5]
        FyHigh=LoadCase[1:,nearestcol+11]
        FzLow=LoadCase[1:,nearestcol+6]
        FzHigh=LoadCase[1:,nearestcol+12]
        InterpFactor=(rootdist-Distance0)/(Distance1-Distance0)
        Mx=MxLow+(MxHigh-MxLow)*InterpFactor[0,0]
        My=MyLow+(MyHigh-MyLow)*InterpFactor[0,0]
        Mz=MzLow+(MzHigh-MzLow)*InterpFactor[0,0]
        Fx=-FxLow+(FxHigh-FxLow)*InterpFactor[0,0]
        Fy=-FyLow+(FyHigh-FyLow)*InterpFactor[0,0]
        Fz=FzLow+(FzHigh-FzLow)*InterpFactor[0,0]
        #Loop through section coordinates
        for ElementNo in range(len(Elements)):
            MaterialID=int(Elements[ElementNo,1])
            if Materials[MaterialID]['curvefit'][0,0]!=3:
                StrainHist=UnitStrains[ElementNo,0]*Mx+UnitStrains[ElementNo,1]*My+UnitStrains[ElementNo,2]*Fz

            elif Materials[MaterialID]['curvefit'][0,0]==3:

                StrainHist=UnitStrains[ElementNo,3]*Fx+UnitStrains[ElementNo,4]*Fy+UnitStrains[ElementNo,5]*Mz

            EndIn=len(StrainHist)
            Extrema=np.bitwise_or(np.bitwise_and(StrainHist[1:EndIn-1]<=StrainHist[0:EndIn-2] , StrainHist[1:EndIn-1]<=StrainHist[2:EndIn]),np.bitwise_and(StrainHist[1:EndIn-1]>=StrainHist[0:EndIn-2] , StrainHist[1:EndIn-1]>=StrainHist[2:EndIn]))
            Extrema=np.concatenate((np.array([True]),Extrema,np.array([True])),axis=0)
            Extrema=StrainHist[np.where(Extrema==True)]
            del StrainHist
            #Do fatigue analysis
        self.statustext.emit('Analysing load case '+str(LoadCaseNo+1)+' of '+str(len(LoadCases[0]['LoadCaseLoops']))+' - '+str(((SectionNo+1)*100)/SectionCount)+'% complete')
        del MxLow,MxHigh,MyLow,MyHigh,MzLow,MzHigh,FxLow,FxHigh,FyLow,FyHigh,FzLow,FzHigh,Mx,My,Mz,Fx,Fy,Fz,Distance0,Distance1
    gc.collect()

推荐答案

在某处显然存在一个保留周期或其他泄漏,但是如果没有看到您的代码,就不能说更多.但是,由于您似乎对解决方法比对解决方案更感兴趣……

There's obviously a retain cycle or other leak somewhere, but without seeing your code, it's impossible to say more than that. But since you seem to be more interested in workarounds than solutions…

在MatLab中,我将使用'pack'函数将工作区写入磁盘,清除它,然后在每个循环结束时重新加载它.我知道这不是好习惯,但可以完成工作!我可以以某种方式在Python中完成等效操作吗?

In MatLab, I'd use the 'pack' function to write the workspace to disk, clear it, and then reload it at the end of each loop. I know this isn't good practice but it would get the job done! Can I do the equivalent in Python somehow?

否,Python没有等效于pack的任何东西. (当然,如果您确切知道要保留的值集,则始终可以np.savetxtpickle.dump或以其他方式存储它们,然后execspawn一个新的解释器实例,然后np.loadtxtpickle.load或以其他方式恢复这些值.但是,如果您确切知道要保留的值集,那么除非您实际上遇到了未知的内存泄漏,否则可能不会首先遇到此问题.在NumPy中,这不太可能.)

No, Python doesn't have any equivalent to pack. (Of course if you know exactly what set of values you want to keep around, you can always np.savetxt or pickle.dump or otherwise stash them, then exec or spawn a new interpreter instance, then np.loadtxt or pickle.load or otherwise restore those values. But then if you know exactly what set of values you want to keep around, you probably aren't going to have this problem in the first place, unless you've actually hit an unknown memory leak in NumPy, which is unlikely.)

但是它的某些内容可能更好.启动一个子流程来分析每个元素(或每个批次的元素,如果它们足够小以至于产生流程的开销很重要),将结果发送回文件或队列中,然后退出.

But it has something that may be better. Kick off a child process to analyze each element (or each batch of elements, if they're small enough that the process-spawning overhead matters), send the results back in a file or over a queue, then quit.

例如,如果您正在执行此操作:

For example, if you're doing this:

def analyze(thingy):
    a = build_giant_array(thingy)
    result = process_giant_array(a)
    return result

total = 0
for thingy in thingies:
    total += analyze(thingy)

您可以将其更改为此:

def wrap_analyze(thingy, q):
    q.put(analyze(thingy))

total = 0
for thingy in thingies:
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=wrap_analyze, args=(thingy, q))
    p.start()
    p.join()
    total += q.get()

(这假定每个thingyresult既小又可腌制.如果它是一个巨大的NumPy数组,请查看NumPy的共享内存包装器,这些包装器旨在在您需要直接在进程之间共享内存,而不是通过它.)

(This assumes that each thingy and result is both smallish and pickleable. If it's a huge NumPy array, look into NumPy's shared memory wrappers, which are designed to make things much easier when you need to share memory directly between processes instead of passing it.)

但是您可能想看看 multiprocessing.Pool 可以为您实现自动化(并使代码更易于扩展,例如,并行使用所有内核).请注意,它具有一个maxtasksperchild参数,您可以使用该参数来回收池进程,例如每10个东西,这样它们就不会耗尽内存.

But you may want to look at what multiprocessing.Pool can do to automate this for you (and to make it easier to extend the code to, e.g., use all your cores in parallel). Notice that it has a maxtasksperchild parameter, which you can use to recycle the pool processes every, say, 10 thingies, so they don't run out of memory.

但是回到实际尝试简单地解决问题的地方:

But back to actually trying to solve things briefly:

我试图通过以下方法解决此问题:在每个循环结束时删除工况数组(在删除所有数组的切片之后)并运行gc.collect(),但这没有成功.

I've tried fixing this by deleting the load case array at the end of each loop (after deleting all the arrays which are slices of it) and running gc.collect() but this has not had any success.

这些都不应该有任何区别.如果您每次在循环中只是将所有局部变量都重新分配为新值,并且没有在其他任何地方保留对它们的引用,那么无论如何它们都将被释放,因此您将永远不会拥有更多的东西. 2(简短)时间.并且gc.collect()仅在存在参考循环的情况下才有用.因此,一方面,这些都不起作用是一个好消息,这意味着您的代码中显然没有任何愚蠢的东西.另一方面,这是个坏消息-这意味着出了什么问题显然都不是愚蠢的.

None of that should make any difference at all. If you're just reassigning all the local variables to new values each time through the loop, and aren't keeping references to them anywhere else, then they're just going to get freed up anyway, so you'll never have more than 2 at a (brief) time. And gc.collect() only helps if there are reference cycles. So, on the one hand, it's good news that these had no effect—it means there's nothing obviously stupid in your code. On the other hand, it's bad news—it means that whatever's wrong isn't obviously stupid.

通常人们之所以会这样,是因为他们不断增长一些数据结构而没有意识到这一点.例如,也许您vstack将所有新行都放在giant_array的旧版本中,而不是在一个空数组上,然后删除该旧版本……但这没关系,因为每次循环时,giant_array不是5 * N,而是5 * N,然后是10 * N,然后是15 * N,依此类推. (这只是不久前我愚蠢的 I 的一个例子……同样,在不了解您的代码的情况下,很难给出更具体的例子.)

Usually people see this because they keep growing some data structure without realizing it. For example, maybe you vstack all the new rows onto the old version of giant_array instead of onto an empty array, then delete the old version… but it doesn't matter, because each time through the loop, giant_array isn't 5*N, it's 5*N, then 10*N, then 15*N, and so on. (That's just an example of something stupid I did not long ago… Again, it's hard to give more specific examples while knowing nothing about your code.)

这篇关于通过删除numpy数组释放内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆