使用Python和NumPy的超大型矩阵 [英] Very large matrices using Python and NumPy

查看:528
本文介绍了使用Python和NumPy的超大型矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

NumPy 是一个非常有用的库,通过使用它,我发现它可以可以轻松地处理非常大(10000 x 10000)的矩阵,但开始处理任何更大的矩阵(尝试创建50000 x 50000的矩阵会失败).显然,这是因为内存需求巨大.

NumPy is an extremely useful library, and from using it I've found that it's capable of handling matrices which are quite large (10000 x 10000) easily, but begins to struggle with anything much larger (trying to create a matrix of 50000 x 50000 fails). Obviously, this is because of the massive memory requirements.

是否有一种方法可以以某种方式(没有几个terrabytes的RAM)在NumPy中本机创建巨大的矩阵(比如说一百万乘一百万)?

Is there is a way to create huge matrices natively in NumPy (say 1 million by 1 million) in some way (without having several terrabytes of RAM)?

推荐答案

PyTables和NumPy是解决之道.

PyTables and NumPy are the way to go.

PyTables将以可选的压缩方式将数据以HDF格式存储在磁盘上.我的数据集经常得到10倍压缩,这在处理数千万或几亿行时非常方便.它也非常快.我5岁的笔记本电脑可以像SQL一样执行GROUP BY聚合,处理数据的速度为每秒1,000,000行.对于基于Python的解决方案来说还不错!

PyTables will store the data on disk in HDF format, with optional compression. My datasets often get 10x compression, which is handy when dealing with tens or hundreds of millions of rows. It's also very fast; my 5 year old laptop can crunch through data doing SQL-like GROUP BY aggregation at 1,000,000 rows/second. Not bad for a Python-based solution!

再次作为NumPy数组访问数据非常简单:

Accessing the data as a NumPy recarray again is as simple as:

data = table[row_from:row_to]

HDF库负责读取相关的数据块并将其转换为NumPy.

The HDF library takes care of reading in the relevant chunks of data and converting to NumPy.

这篇关于使用Python和NumPy的超大型矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆