为什么这个巨大的(非稀疏)numpy矩阵适合RAM [英] Why does this giant (non-sparse) numpy matrix fit in RAM

查看:75
本文介绍了为什么这个巨大的(非稀疏)numpy矩阵适合RAM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

numpy.ndarray.nbytes的报道让我很困惑.

我刚刚创建了一个身份矩阵,大小为100万(10 ^ 6),因此具有1万亿行(10 ^ 12).根据OSX活动监视器的报告,Numpy报告此数组为7.28TB,但是python进程仅使用3.98GB的内存.

I just created an identity matrix of size 1 million (10^6), which therefore has 1 trillion rows (10^12). Numpy reports that this array is 7.28TB, yet the python process only uses 3.98GB of memory, as reported by OSX activity monitor.

  • 整个数组是否包含在内存中?
  • Numpy是否以某种方式压缩其表示形式,或者由操作系统处理?
  • 如果仅计算y = 2 * x,其大小应与x相同,则进程内存将增加到约30GB,直到被OS杀死为止.为什么,在不扩大内存使用量的情况下,我可以对x进行哪种操作?
  • Is the whole array contained in memory?
  • Does Numpy somehow compress its representation, or is that handled by the OS?
  • If I simply calculate y = 2 * x, which should be the same size as x, the process memory increases to about 30GB, until it gets killed by the OS. Why, and what kind of operations can I conduct on x without the memory usage expanding so much?

这是我使用的代码:

import numpy as np
x = np.identity(1e6)
x.size
# 1000000000000
x.nbytes / 1024 ** 4
# 7.275957614183426
y = 2 * x
# python console exits and terminal shows: Killed: 9

推荐答案

在Linux上(我假设在Mac下也会发生同样的事情),当程序分配内存时,操作系统实际上不会分配内存,直到使用它.

On Linux (and I'm assuming the same thing happens under Mac), when a program allocates memory, the OS doesn't actually allocate it until it uses it.

如果程序从不使用内存,则操作系统不必浪费内存,但是当程序需要大量内存并确实需要使用内存时,它确实将操作系统置于一席之地.操作系统还不够.

If the program never uses the memory, then the OS doesn't have to waste RAM on it, but it does put the OS in a spot when the program has requested a ton of memory and actually needs to use it, but the OS doesn't have enough.

发生这种情况时,操作系统可能会开始终止其他次要进程并将其RAM分配给请求进程,或者只是终止请求进程(现在正在发生这种情况).

When that happens, the OS may either start killing off minor other processes and give their RAM to the requesting process, or just kill off the requesting process (which is what is happening now).

Python最初使用的4GB内存可能是numpy在单位矩阵上设置1的页面;其余页面尚未使用.进行2*x之类的数学运算会开始访问(并分配)所有页面,直到操作系统内存不足并杀死您的进程.

The initial 4GB of memory that Python uses is likely the pages where numpy set the 1 on the identity matrix; the rest of the pages haven't been used yet. Doing a math operation like 2*x starts accessing (and thus alloocating) all the pages until the OS runs out of memory and kills your process.

这篇关于为什么这个巨大的(非稀疏)numpy矩阵适合RAM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆