是否有一个“增强"的功能? numpy/scipy点方法? [英] Is there an "enhanced" numpy/scipy dot method?

查看:83
本文介绍了是否有一个“增强"的功能? numpy/scipy点方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题

我想使用numpy或scipy计算以下内容:

Y = A**T * Q * A

其中Am x n矩阵,A**TA的转置,Qm x m对角矩阵.

由于Q是对角矩阵,所以我仅将其对角元素存储为向量.

解决Y的方法

目前,我可以想到两种方法来计算Y:

  1. Y = np.dot(np.dot(A.T, np.diag(Q)), A)
  2. Y = np.dot(A.T * Q, A).

显然,选项2比选项1更好,因为不必使用diag(Q)创建实际矩阵(如果这是numpy真正的功能...)
但是,这两种方法都存在必须分配比实际需要更多的内存的缺点,因为必须将A.T * Qnp.dot(A.T, np.diag(Q))A一起存储才能计算Y.

问题

是否有numpy/scipy中的方法可以消除不必要的额外内存分配,您只需传递两个矩阵AB(在我的情况下BA.T)和一个加权向量Q和它一起吗?

解决方案

(w/r/t OP的最后一句话:我不是知道这种numpy/scipy方法,但是w /r/t OP标题中的问题(即,改善NumPy点的性能),以下内容应有所帮助.换句话说,我的回答是针对改善大多数包含步骤的性能您对Y的功能).

首先,这应该使您明显优于普通的NumPy dot 方法:

>>> from scipy.linalg import blas as FB
>>> vx = FB.dgemm(alpha=1., a=v1, b=v2, trans_b=True)

请注意,两个数组v1,v2都是C_FORTRAN顺序的

您可以通过数组的 flags 属性访问NumPy数组的字节顺序,如下所示:

>>> c = NP.ones((4, 3))
>>> c.flags
      C_CONTIGUOUS : True          # refers to C-contiguous order
      F_CONTIGUOUS : False         # fortran-contiguous
      OWNDATA : True
      MASKNA : False
      OWNMASKNA : False
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False

要更改数组之一的顺序,使它们对齐,只需调用NumPy数组构造函数,传入数组并将相应的 order 标志设置为True

>>> c = NP.array(c, order="F")

>>> c.flags
      C_CONTIGUOUS : False
      F_CONTIGUOUS : True
      OWNDATA : True
      MASKNA : False
      OWNMASKNA : False
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False

您可以通过利用数组顺序对齐来进一步优化,以减少由于复制原始数组而导致的过多内存消耗.

但是为什么在传递给 dot 之前先复制数组?

点积依赖BLAS操作.这些操作需要以C连续顺序存储数组-正是这种约束导致数组被复制.

另一方面,尽管不幸地以 Fortran顺序返回结果,但转置不会影响.

因此,要消除性能瓶颈,您需要消除谓词数组复制步骤;为此,只需将两个数组以C连续的顺序传递给 dot .

因此要计算点(A.T.,A) 而无需进行额外复制:

>>> import scipy.linalg.blas as FB
>>> vx = FB.dgemm(alpha=1.0, a=A.T, b=A.T, trans_b=True)

总而言之,上面的表达式(连同谓词import语句)可以代替点,以提供相同的功能,但性能更好

您可以将该表达式绑定到类似这样的函数上:

>>> super_dot = lambda v, w: FB.dgemm(alpha=1., a=v.T, b=w.T, trans_b=True)

Problem

I would like to compute the following using numpy or scipy:

Y = A**T * Q * A

where A is a m x n matrix, A**T is the transpose of A and Q is an m x m diagonal matrix.

Since Q is a diagonal matrix I store only its diagonal elements as a vector.

Ways of solving for Y

Currently I can think of two ways of how to calculate Y:

  1. Y = np.dot(np.dot(A.T, np.diag(Q)), A) and
  2. Y = np.dot(A.T * Q, A).

Clearly option 2 is better than option 1 since no real matrix has to be created with diag(Q) (if this is what numpy really does...)
However, both methods suffer from the defect of having to allocate more memory than there really is necessary since A.T * Q and np.dot(A.T, np.diag(Q)) have to be stored along with A in order to calculate Y.

Question

Is there a method in numpy/scipy that would eliminate the unnecessary allocation of extra memory where you would only pass two matrices A and B (in my case B is A.T) and a weighting vector Q along with it?

解决方案

(w/r/t the last sentence of the OP: i am not aware of such a numpy/scipy method but w/r/t the Question in the OP Title (i.e., improving NumPy dot performance) what's below should be of some help. In other words, my answer is directed to improving performance of most of the steps comprising your function for Y).

First, this should give you a noticeable boost over the vanilla NumPy dot method:

>>> from scipy.linalg import blas as FB
>>> vx = FB.dgemm(alpha=1., a=v1, b=v2, trans_b=True)

Note that the two arrays, v1, v2 are both in C_FORTRAN order

You can access the byte order of a NumPy array through an array's flags attribute like so:

>>> c = NP.ones((4, 3))
>>> c.flags
      C_CONTIGUOUS : True          # refers to C-contiguous order
      F_CONTIGUOUS : False         # fortran-contiguous
      OWNDATA : True
      MASKNA : False
      OWNMASKNA : False
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False

to change the order of one of the arrays so both are aligned, just call the NumPy array constructor, pass in the array and set the appropriate order flag to True

>>> c = NP.array(c, order="F")

>>> c.flags
      C_CONTIGUOUS : False
      F_CONTIGUOUS : True
      OWNDATA : True
      MASKNA : False
      OWNMASKNA : False
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False

You can further optimize by exploiting array-order alignment to reduce excess memory consumption caused by copying the original arrays.

But why are the arrays copied before being passed to dot?

The dot product relies on BLAS operations. These operations require arrays stored in C-contiguous order--it's this constraint that causes the arrays to be copied.

On the other hand, the transpose does not effect a copy, though unfortunately returns the result in Fortran order:

Therefore, to remove the performance bottleneck, you need to eliminate the predicate array-copying step; to do that just requires passing both arrays to dot in C-contiguous order*.

So to calculate dot(A.T., A) without making an extra copy:

>>> import scipy.linalg.blas as FB
>>> vx = FB.dgemm(alpha=1.0, a=A.T, b=A.T, trans_b=True)

In sum, the expression just above (along with the predicate import statement) can substitute for dot, to supply the same functionality but better performance

you can bind that expression to a function like so:

>>> super_dot = lambda v, w: FB.dgemm(alpha=1., a=v.T, b=w.T, trans_b=True)

这篇关于是否有一个“增强"的功能? numpy/scipy点方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆