当维度太高时,numpy multivariate_normal 错误 [英] numpy multivariate_normal bug when dimension too high

查看:96
本文介绍了当维度太高时,numpy multivariate_normal 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个家庭作业,我注意到当均值和协方差的维度非常高时,multivariate_normal 将永远占用所有 CPU,不会产生任何结果.

I am working on a homework assignment and I noticed that when the dimension of mean and covariance is very high, multivariate_normal will occupy all CPU forever, without generating any results.

示例代码片段,

cov_true  = np.eye(p)
mean_true = np.zeros(p)
beta_true = multivariate_normal(mean_true, cov_true, size=1).T

p=5000 时,这将永远运行.环境,python3.4 和 python3.5,numpy 1.11.0

when p=5000, this will run forever. environment, python3.4 and python3.5, numpy 1.11.0

这真的是一个错误还是我错过了什么?

Is it really a bug or did I miss something?

推荐答案

什么需要这么多时间?

考虑变量 NumPy 之间的关系 计算协方差矩阵的奇异值分解,这需要大部分时间(底层GESDD一般是Θ(n3),50003已经有点了.

To account for relations between variables NumPy computes the singular value decomposition of your covariance matrix and this takes the majority of the time (the underlying GESDD is in general Θ(n3), and 50003 is already a bit).

如何加快速度?

在所有变量独立的最简单情况下,您可以使用 random.normal:

In the simplest case with all variables independent, you could just use random.normal:

from numpy.random import normal

sample = normal(means, deviations, len(means))

否则,如果您的协方差矩阵恰好是满秩(因此是正定的),请用 cholesky(一般仍然是 Θ(n3),但常数较小):

Otherwise, if your covariance matrix happens to be full rank (hence positive-definite), supplant svd with cholesky (still Θ(n3) in general, but with a smaller constant):

from numpy.random import standard_normal
from scipy.linalg import cholesky

l = cholesky(covariances, check_finite=False, overwrite_a=True)
sample = means + l.dot(standard_normal(len(means)))

如果矩阵可能是单一的(有时是这种情况),那么要么包装 SPSTRF 或考虑帮助处理 scipy#6202.

If the matrix may be singular (as is sometimes the case), then either wrap SPSTRF or consider helping with scipy#6202.

Cholesky 可能会明显更快,但如果这还不够,那么您可以进一步研究是否无法解析分解矩阵,或者尝试使用不同的基础库(例如 ACML、MKL 或cuSOLVER).

Cholesky will likely be noticeably faster, but if that's not sufficient, then further you could research if if it wouldn't be possible to decompose the matrix analytically, or try using a different base library (such as ACML, MKL, or cuSOLVER).

这篇关于当维度太高时,numpy multivariate_normal 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆