为什么在8GB内存的macOS计算机上可以使用352GB的NumPy ndarray? [英] Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

查看:68
本文介绍了为什么在8GB内存的macOS计算机上可以使用352GB的NumPy ndarray?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import numpy as np

array = np.zeros((210000, 210000)) # default numpy.float64
array.nbytes

当我在装有macOS的8GB内存MacBook上运行上述代码时,没有发生错误.但是,如果在装有Windows 10的16GB内存PC,12GB内存Ubuntu笔记本电脑甚至是128GB内存Linux超级计算机上运行相同的代码,Python解释器将引发MemoryError.所有测试环境都安装了64位Python 3.6或3.7.

When I run the above code on my 8GB memory MacBook with macOS, no error occurs. But running the same code on a 16GB memory PC with Windows 10, or a 12GB memory Ubuntu laptop, or even on a 128GB memory Linux supercomputer, the Python interpreter will raise a MemoryError. All the test environments have 64-bit Python 3.6 or 3.7 installed.

推荐答案

@Martijn Pieters的答案在正确的轨道上,但并不完全正确:这与内存压缩无关,而是与虚拟内存.

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

例如,尝试在计算机上运行以下代码:

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

此代码分配了32TiB的内存,但是您不会收到错误消息(至少在Linux上我没有).如果我检查htop,则会看到以下内容:

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

这是因为操作系统非常愿意过度使用虚拟内存 .除非需要,否则它实际上不会将页面分配给物理内存.它的工作方式是:

This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

  • calloc要求操作系统使用一些内存
  • 操作系统在进程的页表中查找并找到愿意分配的内存块.这是一种快速的操作,操作系统仅将内存地址范围存储在内部数据结构中.
  • 程序将写入其中一个地址.
  • 操作系统收到页面错误,此时它看起来并实际上将页面分配给了物理内存. 页面的大小通常为几KiB .
  • 操作系统将控制权交还给程序,程序继续执行而没有注意到中断.
  • calloc asks the OS for some memory to use
  • the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.
  • the program writes to one of the addresses.
  • the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.
  • the OS passes control back to the program, which proceeds without noticing the interruption.

创建单个大型数组在Linux上不起作用,因为默认情况下,启发式应用算法来确定是否有足够的可用内存".(

Creating a single huge array doesn't work on Linux because, by default, a "heuristic algorithm is applied to figure out if enough memory is available". (thanks @Martijn Pieters!) Some experiments on my system show that for me, the kernel is unwilling to provide more than 0x3BAFFFFFF bytes. However, if I run echo 1 | sudo tee /proc/sys/vm/overcommit_memory, and then try the program in the OP again, it works fine.

为了娱乐,请尝试运行arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)].您肯定会遇到内存不足的错误,即使在具有交换压缩功能的MacO或Linux上也是如此.是的,某些操作系统可以压缩RAM,但无法将其压缩到不会耗尽内存的水平.

For fun, try running arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)]. You'll definitely get an out of memory error, even on MacOs or Linux with swap compression. Yes, certain OSes can compress RAM, but they can't compress it to the level that you wouldn't run out of memory.

这篇关于为什么在8GB内存的macOS计算机上可以使用352GB的NumPy ndarray?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆