计算大型矩阵的均值和协方差(300000 x 70000) [英] compute the mean and the covariance of a large matrix(300000 x 70000)

查看：87 发布时间：2020/4/30 12:06:20 python numpy matrix linear-algebra

本文介绍了计算大型矩阵的均值和协方差(300000 x 70000)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Numpy并尝试计算大型矩阵(300000 x 70000)的均值和协方差. 我有32GB的可用内存.就计算效率和实现的简便性而言，此任务的最佳实践是什么?

I am using Numpy and trying to compute the mean and the covariance of a large matrix(300000 x 70000). I have 32GB-size memory avaiable. What's the best practice for this task in term of computational efficiency and easiness of implementation?

我当前的实现如下:

def compute_mean_variance(mat, chunk_size):
    row_count = mat.row_count
    col_count = mat.col_count
    # maintain the `x_sum`, `x2_sum` array
    # mean(x) = x_sum / row_count
    # var(x) = x2_sum / row_count - mean(x)**2
    x_sum = np.zeros([1, col_count])
    x2_sum = np.zeros([1, col_count])

    for i in range(0, row_count, chunk_size):
        sub_mat = mat[i:i+chunk_size, :]
        # in-memory sub_mat of size chunk_size x num_cols
        sub_mat = sub_mat.read().val
        x_sum += np.sum(sub_mat, 0)
        x2_sum += x2_sum + np.sum(sub_mat**2, 0)
    x_mean = x_sum / row_count
    x_var = x2_sum / row_count - x_mean ** 2
    return x_mean, x_var

有什么改进建议吗?

我发现以下实现应该更容易理解.它还使用numpy来计算列块的均值和标准差.因此，它应该更有效并且在数值上稳定.

I find the following implementation should more understandable. Also it use numpy to calculate the mean and standard deviation for the chunks of columns. So it should be more efficient and numerically stable.

def compute_mean_std(mat, chunk_size):
    row_count = mat.row_count
    col_count = mat.col_count
    mean = np.zeros(col_count)
    std = np.zeros(col_count)

    for i in xrange(0, col_count, chunk_size):
        sub_mat = mat[:, i : i + chunk_size]
        # num_samples x chunk_size
        sub_mat = sub_mat.read().val
        mean[i : i + chunk_size] = np.mean(sub_mat, axis=0)
        std[i : i + chunk_size] = np.std(sub_mat, axis=0)

    return mean, std

计算大型矩阵的均值和协方差(300000 x 70000) [英] compute the mean and the covariance of a large matrix(300000 x 70000)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

计算大型矩阵的均值和协方差(300000 x 70000) [英] compute the mean and the covariance of a large matrix(300000 x 70000)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭