R中var-covar矩阵的高效计算 [英] Efficient calculation of var-covar matrix in R

查看：21 发布时间：2022/1/7 23:34:27 r performance statistics data.table panel-data

本文介绍了R中var-covar矩阵的高效计算的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找通过 t, t-1 等随时间t 的单个测量计算(自动)协方差矩阵的效率增益.>

在数据矩阵中，每一行代表一个人，每一列代表每月的测量值(列按时间顺序排列).类似于以下数据(尽管有更多的协方差).

#模拟数据set.seed(1)周期<- 70L工业 <- 90000Lmat <- sapply(rep(ind, period), rnorm)

下面是我想出的(丑陋的)代码来获得测量/滞后测量的协方差矩阵.运行大约需要 4 秒.我确信通过移动到 data.table，多思考而不依赖循环，我可以大大减少时间.但是由于协方差矩阵无处不在，我怀疑在 R 中已经存在一种标准(且有效)的方法来做到这一点，我应该首先了解它.

# 获取 0-5 滞后的方差协方差矩阵n_lags <- 5L # 滞后数vcov <- 矩阵(0，nrow = n_lags + 1L，ncol = n_lags + 1)for (i in 0L:n_lags) {for (j in i:n_lags) {vcov[j + 1L, i + 1L] <-sum(mat[, (1L + (j - i)):(periods - i)] *垫 [, 1L:(句点 - j)])/(ind * (句点 - j) - 1)}}轮(vcov，3)[,1] [,2] [,3] [,4] [,5] [,6][1,] 1.001 0.000 0.000 0.000 0.000 0.000[2,] 0.000 1.001 0.000 0.000 0.000 0.000[3,] 0.000 0.000 1.001 0.000 0.000 0.000[4,] 0.000 0.000 0.000 1.001 0.000 0.000[5,] -0.001 0.000 0.000 0.000 1.001 0.000[6,] 0.000 -0.001 0.000 0.000 0.000 1.001

解决方案

@F.Privé 的 Rcpp 实现是一个很好的起点，但我们可以做得更好.您会注意到在 OP 提供的主要算法中，有许多重复的相当昂贵的计算.观察:

OPalgo <- function(m, p, ind1, n) {vcov <- 矩阵(0，nrow = n + 1L，ncol = n + 1)for (i in 0L:n) {for (j in i:n) {## 第一个 & 的下限和上限范围第二个被乘数打印(粘贴(c((1L +(j - i))，:"，(句点 - i)，",1L,":",(句点 - j)), collapse = ""))vcov[j + 1L, i + 1L] <-sum(mat[, (1L + (j - i)):(periods - i)] *垫 [, 1L:(句点 - j)])/(ind * (句点 - j) - 1)}}病毒}OPalgo(垫子，句点，ind，n_lags)[1] "1:70 1:70" ## 包含 "1:65 1:65"[1] 2:70 1:69"[1] 3:70 1:68"[1] 4:70 1:67"[1] 5:70 1:66"[1]6:70 1:65"[1] "1:69 1:69" ## 包含 "1:65 1:65"[1] 2:69 1:68"[1] 3:69 1:67"[1] 4:69 1:66"[1] 5:69 1:65"[1] "1:68 1:68" ## 包含 "1:65 1:65"[1] 2:68 1:67"[1] "3:68 1:66"[1] 4:68 1:65"[1] "1:67 1:67" ## 包含 "1:65 1:65"[1] 2:67 1:66"[1] 3:67 1:65"[1] "1:66 1:66" ## 包含 "1:65 1:65"[1] 2:66 1:65"[1] "1:65 1:65"

如您所见，乘积 mat[,1:65] * mat[,1:65] 在上面执行了 6 次.第一次出现和最后一次出现之间的唯一区别是第一次出现有额外的 5 列.所以不是计算:

sum(mat[ , 1:70] * mat[ , 1:70])sum(mat[, 1:69] * mat[, 1:69])sum(mat[, 1:68] * mat[, 1:68])sum(mat[, 1:67] * mat[, 1:67])sum(mat[, 1:66] * mat[, 1:66])sum(mat[, 1:65] * mat[, 1:65])

我们可以计算 preCalc[1] <- sum(mat[ , 1:65] * mat[ , 1:65]) 一次，然后在其他 5 次计算中使用它，像这样:

preCalc[1] + sum(mat[ , 66:70] * mat[ , 66:70])preCalc[1] + sum(mat[, 66:69] * mat[, 66:69])preCalc[1] + sum(mat[, 66:68] * mat[, 66:68])preCalc[1] + sum(mat[, 66:67] * mat[, 66:67])preCalc[1] + sum(mat[, 66:66] * mat[, 66:66])

在上面的每一个中，我们减少了90000 * 65 = 5,850,000的乘法次数和5,850,000 - 1 = 5,849,999的加法次数保存了 11,699,999 个算术运算.下面的函数实现了这一点.

fasterAlgo <- function(m, p, ind1, n) {vcov <- 矩阵(0，nrow = n + 1L，ncol = n + 1)preCals <- vapply(1:(n + 1L), function(x) sum(m[, x:(p - n + x - 2L)] *m[, 1L:(p - n - 1L)]), 42.42)for (i in 0L:n) {for (j in i:n) {myNum <- preCals[1L + j - i] + sum(m[, (p - n + j - i):(p - i)] * m[, (p - n):(p - j)])vcov[j + 1L, i + 1L] <- myNum/(ind * (p - j) - 1)}}病毒}## 输出相同的结果all.equal(OPalgo(mat, period, ind, n_lags), fastAlgo(mat, period, ind, n_lags))[1] 真

基准:

## 我在基准测试之前注释掉了 OPalgo 的打印语句图书馆(微基准)微基准(OP = OPalgo(垫子，句号，ind，n_lags)，fastBase = fastAlgo(mat, period, ind, n_lags),RcppOrig = compute_vcov(mat, n_lags), times = 5)单位:毫秒expr min lq 平均中位数 uq max neval cldOP 2775.6110 2780.7207 2843.6012 2784.976 2899.7621 2976.9356 5 cfastBase 863.3897 863.9681 865.5576 865.593 866.7962 868.0409 5 bRcppOrig 160.1040 161.8922 162.0153 162.235 162.4756 163.3697 5 a

如您所见，通过此修改，我们看到至少有 3 倍的改进，但 Rcpp 仍然要快得多.让我们在Rcpp中实现上面的概念.

//[[Rcpp::export]]NumericMatrix compute_vcov2(const NumericMatrix& mat, int n_lags) {NumericMatrix vcov(n_lags + 1, n_lags + 1);std::vector预计算；preCalcs.reserve(n_lags + 1);双 myCov;int i, j, k1, k2, l;int n = mat.nrow();int m = mat.ncol();for (i = 0; i <= n_lags; i++) {我的冠状病毒 = 0;对于 (k1 = i, k2 = 0; k2 <(m - n_lags - 1); k1++, k2++) {for (l = 0; l

新基准:

microbenchmark(OP = OPalgo(mat, period, ind, n_lags),fastBase = fastAlgo(mat, period, ind, n_lags),RcppOrig = compute_vcov(mat, n_lags),RcppModified = compute_vcov2(mat, n_lags), times = 5)单位:毫秒expr min lq 平均中位数 uq max neval cldOP 2785.4789 2786.67683 2811.02528 2789.37719 2809.61270 2883.98073 5 天fastBase 866.5601 868.25555 888.64418 869.31796 870.92308 968.16417 5 cRcppOrig 160.3467 161.37992 162.74899 161.73009 164.38653 165.90174 5 bRcpp 修改 51.1641 51.67149 52.87447 52.56067 53.06273 55.91334 5 a

现在增强的 Rcpp 解决方案比原始 Rcpp 解决方案快 3 倍左右，比 OP 提供的原始算法快 50 倍左右.

更新

我们可以做得更好.我们可以颠倒索引 i/j 的范围，从而不断更新 preCalcs.这允许每次迭代最多只计算一个新列的乘积.随着 n_lags 的增加，这确实发挥了作用.观察:

//[[Rcpp::export]]NumericMatrix compute_vcov3(const NumericMatrix& mat, int n_lags) {NumericMatrix vcov(n_lags + 1, n_lags + 1);std::vector预计算；preCalcs.reserve(n_lags + 1);int i, j, k1, k2, l;int n = mat.nrow();int m = mat.ncol();for (i = 0; i <= n_lags; i++) {preCalcs.push_back(0);对于 (k1 = i, k2 = 0; k2 <(m - n_lags); k1++, k2++) {for (l = 0; l = 0; i--) { ## 反向范围for (j = n_lags; j >= i; j--) { ## 反向范围vcov(i, j) = vcov(j, i) = preCalcs[j - i]/(n * (m - j) - 1);如果 (i > 0 && i > 0) {对于 (k1 = m - i, k2 = m - j; k2 <= (m - j); k1++, k2++) {for (l = 0; l

Rcpp 仅基准:

n_lags <- 50L微基准(RcppOrig = compute_vcov(mat, n_lags),RcppModified = compute_vcov2(mat, n_lags),RcppExtreme = compute_vcov3(mat, n_lags), times = 5)单位:毫秒expr min lq 平均中位数 uq max neval cldRcppOrig 7035.7920 7069.7761 7083.4961 7070.3395 7119.028 7122.5446 5 cRcpp 修改 3608.8986 3645.8585 3653.0029 3654.7209 3663.716 3691.8202 5 bRcppExtreme 324.8252 330.7381 332.9657 333.5919 335.168 340.5054 5 a

最新的实现现在比原始 Rcpp 版本快 20 多倍，并且当 n-lags 很大时比原始算法快 300 多倍.

I'm looking for efficiency gains in calculating the (auto)covariance matrix from individual measurements over time t with t, t-1, etc..



In the data matrix, each row represents an individual and each column represents monthly measurements (the columns are in time order). Similar to the following data (although with some more co-variance).
# simulate data
set.seed(1)
periods <- 70L
ind <- 90000L
mat <- sapply(rep(ind, periods), rnorm)
Below is the (ugly) code I came up with to get the covariance matrix for measurements/ lagged measurements. It takes almost 4 seconds to run. I'm sure that by moving to data.table, thinking more and not relying on loops I could cut the time by a big amount. But since covariance matrices are ubiquitous I suspect there already exists a standard (and efficient) way to do this in R that I should know about first.
# Get variance covariance matrix for 0-5 lags    
n_lags <- 5L # Number of lags
vcov <- matrix(0, nrow = n_lags + 1L, ncol = n_lags + 1)
for (i in 0L:n_lags) {
  for (j in i:n_lags) {
    vcov[j + 1L, i + 1L] <- 
      sum(mat[, (1L + (j - i)):(periods - i)] *
          mat[, 1L:(periods - j)]) /
      (ind * (periods - j) - 1)
  }
}
round(vcov, 3)

       [,1]   [,2]  [,3]  [,4]  [,5]  [,6]
[1,]  1.001  0.000 0.000 0.000 0.000 0.000
[2,]  0.000  1.001 0.000 0.000 0.000 0.000
[3,]  0.000  0.000 1.001 0.000 0.000 0.000
[4,]  0.000  0.000 0.000 1.001 0.000 0.000
[5,] -0.001  0.000 0.000 0.000 1.001 0.000
[6,]  0.000 -0.001 0.000 0.000 0.000 1.001

 解决方案 
@F. Privé's Rcpp implementation is a good starting place, but we can do better. You will notice in the main algorithm supplied by the OP that there are many replicated fairly expensive calculations.  Observe:
OPalgo <- function(m, p, ind1, n) {
    vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
    for (i in 0L:n) {
        for (j in i:n) {
            ## lower and upper range for the first & second multiplicand
            print(paste(c((1L + (j - i)),":",(periods - i)," 
                          ",1L,":",(periods - j)), collapse = ""))

            vcov[j + 1L, i + 1L] <- 
                sum(mat[, (1L + (j - i)):(periods - i)] *
                                mat[, 1L:(periods - j)]) /
                                    (ind * (periods - j) - 1)
        }
    }
    vcov
}

OPalgo(mat, periods, ind, n_lags)
[1] "1:70 1:70"  ## contains "1:65 1:65"
[1] "2:70 1:69"
[1] "3:70 1:68"
[1] "4:70 1:67"
[1] "5:70 1:66"
[1] "6:70 1:65"
[1] "1:69 1:69"  ## contains "1:65 1:65"
[1] "2:69 1:68"
[1] "3:69 1:67"
[1] "4:69 1:66"
[1] "5:69 1:65"
[1] "1:68 1:68"  ## contains "1:65 1:65"
[1] "2:68 1:67"
[1] "3:68 1:66"
[1] "4:68 1:65"
[1] "1:67 1:67"  ## contains "1:65 1:65"
[1] "2:67 1:66"
[1] "3:67 1:65"
[1] "1:66 1:66"  ## contains "1:65 1:65"
[1] "2:66 1:65"
[1] "1:65 1:65"
As you can see, the product mat[,1:65] * mat[,1:65] is performed 6 times above. The only difference between the first occurrence and the last occurrence is that the first occurrence has an additional 5 columns. So instead of computing:
sum(mat[ , 1:70] * mat[ , 1:70])
sum(mat[ , 1:69] * mat[ , 1:69])
sum(mat[ , 1:68] * mat[ , 1:68])
sum(mat[ , 1:67] * mat[ , 1:67])
sum(mat[ , 1:66] * mat[ , 1:66])
sum(mat[ , 1:65] * mat[ , 1:65])
We can compute preCalc[1] <- sum(mat[ , 1:65] * mat[ , 1:65]) one time and use this in the other 5 calculations like so:
preCalc[1] + sum(mat[ , 66:70] * mat[ , 66:70])
preCalc[1] + sum(mat[ , 66:69] * mat[ , 66:69])
preCalc[1] + sum(mat[ , 66:68] * mat[ , 66:68])
preCalc[1] + sum(mat[ , 66:67] * mat[ , 66:67])
preCalc[1] + sum(mat[ , 66:66] * mat[ , 66:66])
In each of the above, we have reduce the number of multiplications by 90000 * 65 = 5,850,000 and the number of additions by 5,850,000 - 1 = 5,849,999 for a total of 11,699,999 arithmetic operations saved. The function below achieves this very thing.
fasterAlgo <- function(m, p, ind1, n) {
    vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
    preCals <- vapply(1:(n + 1L), function(x) sum(m[ , x:(p - n + x - 2L)] * 
                                               m[ , 1L:(p - n - 1L)]), 42.42)
    for (i in 0L:n) {
        for (j in i:n) {
            myNum <- preCals[1L + j - i] + sum(m[, (p - n + j - i):(p - i)] * m[, (p - n):(p - j)])
            vcov[j + 1L, i + 1L] <- myNum / (ind * (p - j) - 1)
        }
    }
    vcov
}

## outputs same results
all.equal(OPalgo(mat, periods, ind, n_lags), fasterAlgo(mat, periods, ind, n_lags))
[1] TRUE
Benchmarks:
## I commented out the print statements of the OPalgo before benchmarking
library(microbenchmark)
microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
               fasterBase = fasterAlgo(mat, periods, ind, n_lags),
               RcppOrig = compute_vcov(mat, n_lags), times = 5)
Unit: milliseconds
        expr       min        lq      mean   median        uq       max neval cld
          OP 2775.6110 2780.7207 2843.6012 2784.976 2899.7621 2976.9356     5   c
  fasterBase  863.3897  863.9681  865.5576  865.593  866.7962  868.0409     5  b 
    RcppOrig  160.1040  161.8922  162.0153  162.235  162.4756  163.3697     5 a  
As you can see, with this modification we see at least a 3 fold improvement but the Rcpp is still much faster.  Let's implement the above concept in Rcpp.
// [[Rcpp::export]]
NumericMatrix compute_vcov2(const NumericMatrix& mat, int n_lags) {

    NumericMatrix vcov(n_lags + 1, n_lags + 1);
    std::vector<double> preCalcs;
    preCalcs.reserve(n_lags + 1);
    double myCov;

    int i, j, k1, k2, l;
    int n = mat.nrow();
    int m = mat.ncol();

    for (i = 0; i <= n_lags; i++) {
        myCov = 0;
        for (k1 = i, k2 = 0; k2 < (m - n_lags - 1); k1++, k2++) {
            for (l = 0; l < n; l++) {
                myCov += mat(l, k1) * mat(l, k2); 
            }
        }
        preCalcs.push_back(myCov);
    }

    for (i = 0; i <= n_lags; i++) {
        for (j = i; j <= n_lags; j++) {
            myCov = preCalcs[j - i];
            for (k1 = m - n_lags + j - i - 1, k2 = m - n_lags - 1; k2 < (m - j); k1++, k2++) {
                for (l = 0; l < n; l++) {
                    myCov += mat(l, k1) * mat(l, k2);
                }
            }
            myCov /= n * (m - j) - 1;
            vcov(i, j) = vcov(j, i) = myCov;
        }
    }

    return vcov;
}

## gives same results
all.equal(compute_vcov2(mat, n_lags), compute_vcov(mat, n_lags))
[1] TRUE
New benchmarks:
microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
               fasterBase = fasterAlgo(mat, periods, ind, n_lags),
               RcppOrig = compute_vcov(mat, n_lags), 
               RcppModified = compute_vcov2(mat, n_lags), times = 5)
Unit: milliseconds
        expr       min         lq       mean     median         uq        max neval  cld
          OP 2785.4789 2786.67683 2811.02528 2789.37719 2809.61270 2883.98073     5    d
  fasterBase  866.5601  868.25555  888.64418  869.31796  870.92308  968.16417     5   c 
    RcppOrig  160.3467  161.37992  162.74899  161.73009  164.38653  165.90174     5  b  
RcppModified   51.1641   51.67149   52.87447   52.56067   53.06273   55.91334     5 a   
Now the enhanced Rcpp solution is around 3x faster the original Rcpp solution and around 50x faster than the original algorithm provided by the OP.

Update

We can do even better.  We can reverse the ranges of the indices i/j so as to continuously update preCalcs. This allows up to only compute the product of one new column every iteration.  This really comes into play as n_lags increases.  Observe:
// [[Rcpp::export]]
NumericMatrix compute_vcov3(const NumericMatrix& mat, int n_lags) {

    NumericMatrix vcov(n_lags + 1, n_lags + 1);
    std::vector<double> preCalcs;
    preCalcs.reserve(n_lags + 1);

    int i, j, k1, k2, l;
    int n = mat.nrow();
    int m = mat.ncol();

    for (i = 0; i <= n_lags; i++) {
        preCalcs.push_back(0);
        for (k1 = i, k2 = 0; k2 < (m - n_lags); k1++, k2++) {
            for (l = 0; l < n; l++) {
                preCalcs[i] += mat(l, k1) * mat(l, k2); 
            }
        }
    }

    for (i = n_lags; i >= 0; i--) {  ## reverse range
        for (j = n_lags; j >= i; j--) {   ## reverse range
            vcov(i, j) = vcov(j, i) = preCalcs[j - i] / (n * (m - j) - 1);
            if (i > 0 && i > 0) {
                for (k1 = m - i, k2 = m - j; k2 <= (m - j); k1++, k2++) {
                    for (l = 0; l < n; l++) {
                        ## updating preCalcs vector
                        preCalcs[j - i] += mat(l, k1) * mat(l, k2);  
                    }
                }
            }
        }
    }

    return vcov;
}

all.equal(compute_vcov(mat, n_lags), compute_vcov3(mat, n_lags))
[1] TRUE
Rcpp benchmarks only:
n_lags <- 50L
microbenchmark(RcppOrig = compute_vcov(mat, n_lags),
                 RcppModified = compute_vcov2(mat, n_lags),
                 RcppExtreme = compute_vcov3(mat, n_lags), times = 5)
Unit: milliseconds
        expr       min        lq      mean    median       uq       max neval cld
    RcppOrig 7035.7920 7069.7761 7083.4961 7070.3395 7119.028 7122.5446     5   c
RcppModified 3608.8986 3645.8585 3653.0029 3654.7209 3663.716 3691.8202     5  b 
 RcppExtreme  324.8252  330.7381  332.9657  333.5919  335.168  340.5054     5 a  
The newest implementation is now over 20x faster than the original Rcpp version and well over 300x faster than the original algorithm when n-lags is large.

                        这篇关于R中var-covar矩阵的高效计算的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R中var-covar矩阵的高效计算 [英] Efficient calculation of var-covar matrix in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R中var-covar矩阵的高效计算 [英] Efficient calculation of var-covar matrix in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭