使用 Python 和 NumPy 的超大矩阵 [英] Very large matrices using Python and NumPy

查看:43
本文介绍了使用 Python 和 NumPy 的超大矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

NumPy 是一个非常有用的库,通过使用它我发现它有能力处理非常大(10000 x 10000)的矩阵很容易,但开始处理更大的矩阵(尝试创建 50000 x 50000 的矩阵失败).显然,这是因为需要大量内存.

NumPy is an extremely useful library, and from using it I've found that it's capable of handling matrices which are quite large (10000 x 10000) easily, but begins to struggle with anything much larger (trying to create a matrix of 50000 x 50000 fails). Obviously, this is because of the massive memory requirements.

有没有办法以某种方式(没有几 TB 的 RAM)在 NumPy 中本地创建巨大的矩阵(比如 100 万乘 100 万)?

Is there is a way to create huge matrices natively in NumPy (say 1 million by 1 million) in some way (without having several terrabytes of RAM)?

推荐答案

PyTables 和 NumPy 是要走的路.

PyTables and NumPy are the way to go.

PyTables 将数据以 HDF 格式存储在磁盘上,并可选择压缩.我的数据集通常会得到 10 倍的压缩,这在处理数千万或数亿行时非常方便.它也非常快;我 5 岁的笔记本电脑可以处理数据,以 1,000,000 行/秒的速度执行类似 SQL 的 GROUP BY 聚合.对于基于 Python 的解决方案来说还不错!

PyTables will store the data on disk in HDF format, with optional compression. My datasets often get 10x compression, which is handy when dealing with tens or hundreds of millions of rows. It's also very fast; my 5 year old laptop can crunch through data doing SQL-like GROUP BY aggregation at 1,000,000 rows/second. Not bad for a Python-based solution!

再次访问作为 NumPy recarray 的数据非常简单:

Accessing the data as a NumPy recarray again is as simple as:

data = table[row_from:row_to]

HDF 库负责读取相关数据块并转换为 NumPy.

The HDF library takes care of reading in the relevant chunks of data and converting to NumPy.

这篇关于使用 Python 和 NumPy 的超大矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆