快数组上的random.multivariate_normal? [英] random.multivariate_normal on a dask array?
问题描述
我一直在努力寻找一种方法来获取适用于繁琐工作流程的计算。
I've been struggling to find a way to get this calc that works for a dask workflow.
我有使用np.random.mulivariate_normal函数和虽然许多类型可以在快速数组中使用,但似乎没有。如此...。我试图根据dask中提供的示例创建自己的文档。
I have code that uses np.random.mulivariate_normal function and while many of these types are available to us on dask array it seems this one it not. Sooo.... I attempted to create my own based on an example provided in the dask documentation.
这是我的尝试,出现了我很难理解的错误。我还提供了随机输入变量以使其易于复制:
Here is my attempt which is giving errors that I am having difficulty understanding. I also provided random input variables to make it easy to replicate:
import numpy as np
from dask.distributed import Client
import dask.array as da
def mvn(mu, sigma, n, blocksize):
chunks = ((blocksize,) * (n // blocksize),
(blocksize,) * (n // blocksize))
name = 'mvn' # unique identifier
dsk = {(name, i, j): (np.random.multivariate_normal(mu,sigma, blocksize))
if i == j else
(np.zeros, (blocksize, blocksize))
for i in range(n // blocksize)
for j in range(n // blocksize)}
dtype = np.random.multivariate_normal(0).dtype # take dtype default from numpy
return da.Array(dsk, name, chunks, dtype)
n = 10000
A = da.random.normal(0, 1, size=(n,n), chunks=(1000, 1000))
sigma = da.dot(A,A.transpose())
mu = 4.0*da.ones(n, chunks = 1000)
R = da.numpy.random.mvn(mu, sigma, n, chunks=(100))
任何建议,或者我在这里远远超出了我应放弃的所有希望?谢谢!
Any suggestions or am I so far off the mark here that I should abandon all hope? Thanks!
推荐答案
如果您要在上面运行群集,则可以使用此信息,此处复制以作参考:
If you have a cluster to run this on, you can use my answer from this post, copied here for refrence:
目前的一项工作是使用cholesky分解。注意,任何协方差矩阵C都可以表示为C = G * G'。然后,如果y为标准正态,则x = G'* y如C中指定的那样相关(请参阅此关于StackExchange数学的出色文章)。在代码中:
An work arround for now, is to use a cholesky decomposition. Note that any covariance matrix C can be expressed as C=G*G'. It then follows that x = G'*y is correlated as specified in C if y is standard normal (see this excellent post on StackExchange Mathematic). In code:
Numpy
n_dim =4
size = 100000
A = np.random.randn(n_dim, n_dim)
covm = A.dot(A.T)
x= np.random.multivariate_normal(size=size, mean=np.zeros(len(covm)),cov=covm)
## verify numpys covariance is correct
np.cov(x, rowvar=False)
covm
黄昏
## create covariance matrix
A = da.random.standard_normal(size=(n_dim, n_dim),chunks=(2,2))
covm = A.dot(A.T)
## get cholesky decomp
L = da.linalg.cholesky(covm, lower=True)
## drawn standard normal
sn= da.random.standard_normal(size=(size, n_dim),chunks=(100,100))
## correct for correlation
x =L.dot(sn.T)
x.shape
## verify
covm.compute()
da.cov(x, rowvar=True).compute()
这篇关于快数组上的random.multivariate_normal?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!