如何克服numpy.unique的MemoryError [英] How to overcome MemoryError of numpy.unique
问题描述
我正在使用Numpy版本1.11.1,并且必须处理二维数组
I am using Numpy version 1.11.1 and have to deal with an two-dimensional array of
my_arr.shape = (25000, 25000)
所有值都是整数,我需要一个唯一的数组值列表。当使用 lst = np.unique(my_arr)
时,我得到:
All values are integer, and I need a unique list of the arrays values. When using lst = np.unique(my_arr)
I am getting:
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
palette = np.unique(arr)
File "c:\Python27\lib\site-packages\numpy\lib\arraysetops.py", line 176, in unique
ar = np.asanyarray(ar).flatten()
MemoryError
我的机器只有8 GB的RAM,但是我在另一台具有16 GB RAM的机器上尝试过,结果是相同的。监视内存和CPU使用率并不表明问题与RAM或CPU有关。
My machine has only 8 GB RAM, but I tried it with another machine with 16 GB RAM, and the result is the same. Monitoring the memory and CPU usage doesn't show that the problems are related to RAM or CPU.
原则上,我知道数组所包含的值,但是如果输入发生了变化...另外,如果我想用另一个替换数组的值(假设所有2都用0代替),它还需要很多RAM吗?
In principle, I know the values the array consists of, but what if the input changes... Also, if I want to replace values of the array by another (let's say all 2 by 0), will it need a lot of RAM as well?
推荐答案
Python 32位不能访问超过4 GiB RAM(通常〜2.5 GiB)。显而易见的答案是使用64位版本。如果那不起作用,另一种解决方案是使用 numpy.memmap
并将该内存映射到存储在磁盘上的文件中。
Python 32-bit can't access more than 4 GiB RAM (often ~2.5 GiB). The obvious answer would be to use the 64-bit version. If that doesn't work, another solution would be to use numpy.memmap
and memory-map the array into a file stored on disk.
这篇关于如何克服numpy.unique的MemoryError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!