我可以使用dask创建multivariate_normal矩阵吗? [英] Can I create a multivariate_normal matrix using dask?

查看:83
本文介绍了我可以使用dask创建multivariate_normal矩阵吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此帖子有关的内容,我正在尝试复制 dask 中的 multivariate_normal
使用numpy,我可以使用以下方法创建具有指定协方差的多元正态矩阵: / p>

Somewhat related to this post, I am trying to replicate multivariate_normal in dask: Using numpy I can create a multivariate normal matrix with a specified covariance using:

import numpy as np
n_dim = 5
size = 300
A = np.random.randn(n_dim, n_dim) # a matrix
covm = A.dot(A.T) # A*A^T is positive semi-definite, as a covariance matrix
x = np.random.multivariate_normal(size=300, mean=np.zeros(len(covm)),cov=covm) # generate data

但是我需要一个很大的矩阵其中 n_dim = 4_500_000 size = 100000 。计算CPU和内存的成本将非常昂贵。幸运的是,我可以访问Cloudera DataScience工作台集群,并尝试使用 dask 解决此问题:

I however need a significantly large matrix with n_dim = 4_500_000 and size = 100000. This will be expensive to compute both with respective to CPU and memory. Fortunately, I have access to a Cloudera DataScience Workbench Cluster and was trying to solve this using dask:

import dask.array as da
n_dim = 4_500_000
size = 100000
A = da.random.standard_normal((n_dim, n_dim))  
covm = A.dot(A.T)
#x = da.random.multivariate_normal(size=300, mean=np.zeros(len(covm)),cov=covm) # generate data

文档,我找不到任何似乎可以完成所需功能的函数。有谁知道解决方案/工作环境,可能使用 xarray 或在群集上运行的任何其他模块?

In the documentation, I cannot find any function that seem to do what I need it to. Does anyone know a solution / workarround, possibly using xarray or any other module that runs on clusters?

推荐答案

目前的一项工作是使用cholesky分解。注意,任何协方差矩阵C都可以表示为C = G * G'。然后,如果y为标准正态,则x = G'* y如C中指定的那样相关(请参阅此关于StackExchange数学的出色文章)。在代码中:

An work arround for now, is to use a cholesky decomposition. Note that any covariance matrix C can be expressed as C=G*G'. It then follows that x = G'*y is correlated as specified in C if y is standard normal (see this excellent post on StackExchange Mathematic). In code:

Numpy

n_dim =4
size = 100000
A = np.random.randn(n_dim, n_dim)
covm = A.dot(A.T)

x=  np.random.multivariate_normal(size=size, mean=np.zeros(len(covm)),cov=covm)
## verify numpys covariance is correct
np.cov(x, rowvar=False)
covm

黄昏

## create covariance matrix
A = da.random.standard_normal(size=(n_dim, n_dim),chunks=(2,2))
covm = A.dot(A.T)

## get cholesky decomp
L = da.linalg.cholesky(covm, lower=True)

## drawn standard normal 
sn= da.random.standard_normal(size=(size, n_dim),chunks=(100,100))

## correct for correlation
x =L.dot(sn.T)
x.shape

## verify
covm.compute()
da.cov(x, rowvar=True).compute()

这篇关于我可以使用dask创建multivariate_normal矩阵吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆