Python NUMPY HUGE矩阵乘法 [英] Python NUMPY HUGE Matrices multiplication

查看:115
本文介绍了Python NUMPY HUGE矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将两个大矩阵相乘并对它们的列进行排序.

I need to multiply two big matrices and sort their columns.

 import numpy
 a= numpy.random.rand(1000000, 100)
 b= numpy.random.rand(300000,100)
 c= numpy.dot(b,a.T)
 sorted = [argsort(j)[:10] for j in c.T]

此过程需要大量时间和内存.有没有办法加快这个过程?如果没有,我该如何计算执行此操作所需的RAM?我目前有一个具有2GB RAM且没有交换功能的EC2盒.

This process takes a lot of time and memory. Is there a way to fasten this process? If not how can I calculate RAM needed to do this operation? I currently have an EC2 box with 4GB RAM and no swap.

我想知道此操作是否可以序列化,而不必将所有内容存储在内存中.

I was wondering if this operation can be serialized and I dont have to store everything in the memory.

推荐答案

您可以做的一件事情来加快速度,那就是使用优化的BLAS库(例如)编译numpy. ATLAS,GOTO blas或英特尔专有的MKL.

One thing that you can do to speed things up is compile numpy with an optimized BLAS library like e.g. ATLAS, GOTO blas or Intel's proprietary MKL.

要计算所需的内存,您需要监视Python的常驻集大小("RSS").以下命令在UNIX系统(准确地说是FreeBSD,在64位计算机上)上运行.

To calculate the memory needed, you need to monitor Python's Resident Set Size ("RSS"). The following commands were run on a UNIX system (FreeBSD to be precise, on a 64-bit machine).

> ipython

In [1]: import numpy as np

In [2]: a = np.random.rand(1000, 1000)

In [3]: a.dtype
Out[3]: dtype('float64')

In [4]: del(a)

要获取我运行的RSS:

To get the RSS I ran:

ps -xao comm,rss | grep python

[参见 ps手册页以获取有关这些选项的完整说明,但基本上,这些ps选项使它仅显示所有进程的命令和常驻集大小.我相信Linux的ps的等效格式为ps -xao c,r.]

[ See the ps manual page for a complete explanation of the options, but basically these ps options make it show only the command and resident set size of all processes. The equivalent format for Linux's ps would be ps -xao c,r, I believe.]

结果是;

  • 启动口译员后:24880 kiB
  • 导入numpy后:34364 kiB
  • 创建a后:42200 kiB
  • 删除a后:34368 kiB
  • After starting the interpreter: 24880 kiB
  • After importing numpy: 34364 kiB
  • After creating a: 42200 kiB
  • After deleting a: 34368 kiB

计算尺寸;

In [4]: (42200 - 34364) * 1024
Out[4]: 8024064

In [5]: 8024064/(1000*1000)
Out[5]: 8.024064

如您所见,计算出的大小与默认数据类型float64的8个字节非常匹配.区别在于内部开销.

As you can see, the calculated size matches the 8 bytes for the default datatype float64 quite well. The difference is internal overhead.

MiB中原始数组的大小约为;

The size of your original arrays in MiB will be approximately;

In [11]: 8*1000000*100/1024**2
Out[11]: 762.939453125

In [12]: 8*300000*100/1024**2
Out[12]: 228.8818359375

那还不错.但是,点积太大了:

That's not too bad. However, the dot product will be way too large:

In [19]: 8*1000000*300000/1024**3
Out[19]: 2235.1741790771484

那是2235 GiB!

That's 2235 GiB!

您所能做的就是将问题分解成碎片,并从dot操作中分解出来;

What you can do is split up the problem and perfrom the dot operation in pieces;

  • b作为ndarray加载
  • 依次将a中的每一行作为ndarray加载.
  • 将该行乘以b的每一列,并将结果写入文件.
  • del()该行并加载下一行.
  • load b as an ndarray
  • load every row from a as an ndarray in turn.
  • multiply the row by every column of b and write the result to a file.
  • del() the row and load the next row.

这不会使其速度更快,但是会占用更少的内存!

This wil not make it faster, but it would make it use less memory!

在这种情况下,我建议以二进制格式(例如,使用structndarray.tofile)写入输出文件.这样可以更轻松地从文件中读取列,例如numpy.memmap.

In this case I would suggest writing the output file in binary format (e.g. using struct or ndarray.tofile). That would make it much easier to read a column from the file with e.g. a numpy.memmap.

这篇关于Python NUMPY HUGE矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆