记忆ndarray上的numpy.std失败,出现MemoryError [英] numpy.std on memmapped ndarray fails with MemoryError
本文介绍了记忆ndarray上的numpy.std失败,出现MemoryError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个巨大的(30GB)ndarray内存映射:
I have a huge (30GB) ndarray memory-mapped:
arr = numpy.memmap(afile, dtype=numpy.float32, mode="w+", shape=(n, n,))
用一些值(非常好-最大内存使用量小于1GB)填充后,我要计算标准偏差:
After filling it in with some values (which goes very fine - max memory usage is below 1GB) I want to calculate standard deviation:
print('stdev: {0:4.4f}\n'.format(numpy.std(arr)))
此行对MemoryError
失败.
我不确定为什么会失败.我将不胜感激如何以节省内存的方式计算这些技巧的提示?
I am not sure why this fails. I would be grateful for tips how to calculate these in a memory-efficient way?
环境:venv + Python3.6.2 + NumPy 1.13.1
Environment: venv + Python3.6.2 + NumPy 1.13.1
推荐答案
import math
BLOCKSIZE = 1024**2
# For numerical stability. The closer this is to mean(arr), the better.
PIVOT = arr[0]
n = len(arr)
sum_ = 0.
sum_sq = 0.
for block_start in xrange(0, n, BLOCKSIZE):
block_data = arr[block_start:block_start + BLOCKSIZE]
block_data -= PIVOT
sum_ += math.fsum(block_data)
sum_sq += math.fsum(block_data**2)
stdev = np.sqrt(sum_sq / n - (sum_ / n)**2)
这篇关于记忆ndarray上的numpy.std失败,出现MemoryError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文