为什么一个 352GB 的 NumPy ndarray 可以用在 8GB 内存的 macOS 电脑上? [英] Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

查看:21
本文介绍了为什么一个 352GB 的 NumPy ndarray 可以用在 8GB 内存的 macOS 电脑上?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import numpy as np

array = np.zeros((210000, 210000)) # default numpy.float64
array.nbytes

当我在装有 macOS 的 8GB 内存 MacBook 上运行上述代码时,没有出现错误.但是在装有 Windows 10 的 16GB 内存 PC,或 12GB 内存的 Ubuntu 笔记本电脑,甚至是 128GB 内存的 Linux 超级计算机上运行相同的代码,Python 解释器将引发 MemoryError.所有测试环境都安装了 64 位 Python 3.6 或 3.7.

When I run the above code on my 8GB memory MacBook with macOS, no error occurs. But running the same code on a 16GB memory PC with Windows 10, or a 12GB memory Ubuntu laptop, or even on a 128GB memory Linux supercomputer, the Python interpreter will raise a MemoryError. All the test environments have 64-bit Python 3.6 or 3.7 installed.

推荐答案

@Martijn Pieters 的回答 走在正确的轨道上,但不太正确:这与内存压缩无关,而是与虚拟内存.

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

例如,尝试在您的机器上运行以下代码:

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

此代码分配了 32TiB 的内存,但您不会收到错误消息(至少在 Linux 上我没有).如果我检查 htop,我会看到以下内容:

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

这是因为操作系统完全愿意过度使用虚拟内存.在需要之前,它实际上不会将页面分配给物理内存.它的工作方式是:

This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

  • calloc 要求操作系统提供一些内存使用
  • 操作系统查看进程的页表,并找到它愿意分配的一块内存.这是一种快速操作,操作系统只是将内存地址范围存储在内部数据结构中.
  • 程序写入其中一个地址.
  • 操作系统收到一个页面错误,此时它会查看并实际将页面分配给物理内存.一个页面的大小通常为几 KiB.
  • 操作系统将控制权交还给程序,该程序继续执行而不会注意到中断.
  • calloc asks the OS for some memory to use
  • the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.
  • the program writes to one of the addresses.
  • the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.
  • the OS passes control back to the program, which proceeds without noticing the interruption.

创建单个大数组在 Linux 上不起作用,因为默认情况下,"启发式应用算法来确定是否有足够的内存可用". (感谢@Martijn Pieters!) 我的系统上的一些实验表明,对我来说,内核不愿意提供超过 0x3BAFFFFFF 字节.但是,如果我运行 echo 1 |sudo tee/proc/sys/vm/overcommit_memory,然后再次尝试OP中的程序,它工作正常.

Creating a single huge array doesn't work on Linux because, by default, a "heuristic algorithm is applied to figure out if enough memory is available". (thanks @Martijn Pieters!) Some experiments on my system show that for me, the kernel is unwilling to provide more than 0x3BAFFFFFF bytes. However, if I run echo 1 | sudo tee /proc/sys/vm/overcommit_memory, and then try the program in the OP again, it works fine.

为了好玩,尝试运行 arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)].即使在使用交换压缩的 MacOs 或 Linux 上,您也肯定会遇到内存不足的错误.是的,某些操作系统可以压缩 RAM,但它们无法将其压缩到不会耗尽内存的程度.

For fun, try running arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)]. You'll definitely get an out of memory error, even on MacOs or Linux with swap compression. Yes, certain OSes can compress RAM, but they can't compress it to the level that you wouldn't run out of memory.

这篇关于为什么一个 352GB 的 NumPy ndarray 可以用在 8GB 内存的 macOS 电脑上?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆