如何在python中遍历大型数据集而又没有出现MemoryError? [英] How do I loop through a large dataset in python without getting a MemoryError?

查看：103 发布时间：2020/5/8 19:23:44 python memory

本文介绍了如何在python中遍历大型数据集而又没有出现MemoryError?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一系列栅格数据集，它们代表了几十年来的月降雨量.我已经用Python编写了一个脚本，该脚本遍历每个栅格并执行以下操作:

I have a large series of raster datasets representing monthly rainfall over several decades. I've written a script in Python that loops over each raster and does the following:

将栅格转换为numpy蒙版数组，
执行许多数组代数以计算新的水位，
将结果写入输出栅格.
重复

该脚本只是由循环语句括起来的一长列数组代数方程式.

The script is just a long list of array algebra equations enclosed by a loop statement.

如果我只对数据的一小部分(例如20年的价值)运行脚本，那么一切都会很好，但是如果我尝试处理全部数据，则会得到MemoryError.该错误只提供了更多信息(除了突出显示了Python放弃的代码行).

Everything works well if I just run the script on a small part of my data (say 20 years' worth), but if I try to process the whole lot I get a MemoryError. The error doesn't give any more information than that (except it highlights the line in the code at which Python gave up).

不幸的是，我无法轻松地分块处理我的数据-我真的需要能够一次完成全部工作.这是因为，在每次迭代结束时，输出(水位)都将作为下一个起点反馈到下一次迭代中.

Unfortunately, I can't easily process my data in chunks - I really need to be able to do the whole lot at once. This is because, at the end of each iteration, the output (water level) is fed back into the next iteration as the start point.

目前我对编程的理解非常基础，但是我认为我的所有对象只会在每个循环中被覆盖.我(很愚蠢?)假设，如果代码成功地成功循环了一次，那么它应该可以无限循环，而不会占用越来越多的内存.

My understanding of programming is very basic at present, but I thought that all of my objects would just be overwritten on each loop. I (stupidly?) assumed that if the code managed to loop successfully once then it should be able to loop indefinitely without using up more and more memory.

我尝试阅读各种文档，并发现了一种叫做垃圾收集器"的东西，但我感觉自己已经不懂我的意思了，我的大脑正在融化！在我的代码循环时，谁能提供一些基本的见识来了解内存中的对象实际发生了什么?在每个循环的末尾有没有释放内存的方法，还是有一些"Pythonic"的编码方式可以完全避免这个问题?

I've tried reading various bits of documentation and have discovered something called the "Garbage Collector", but I feel like I'm getting out of my depth and my brain's melting! Can anyone offer some basic insight into what actually happens to objects in memory when my code loops? Is there a way of freeing-up memory at the end of each loop, or is there some more "Pythonic" way of coding which avoids this problem altogether?

如何在python中遍历大型数据集而又没有出现MemoryError? [英] How do I loop through a large dataset in python without getting a MemoryError?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在python中遍历大型数据集而又没有出现MemoryError? [英] How do I loop through a large dataset in python without getting a MemoryError?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭