R中var-covar矩阵的高效计算 [英] Efficient calculation of var-covar matrix in R
问题描述
我正在寻找通过 t, t-1
等随时间t
的单个测量计算(自动)协方差矩阵的效率增益.>
在数据矩阵中,每一行代表一个人,每一列代表每月的测量值(列按时间顺序排列).类似于以下数据(尽管有更多的协方差).
#模拟数据set.seed(1)周期<- 70L工业 <- 90000Lmat <- sapply(rep(ind, period), rnorm)
下面是我想出的(丑陋的)代码来获得测量/滞后测量的协方差矩阵.运行大约需要 4 秒.我确信通过移动到 data.table
,多思考而不依赖循环,我可以大大减少时间.但是由于协方差矩阵无处不在,我怀疑在 R 中已经存在一种标准(且有效)的方法来做到这一点,我应该首先了解它.
# 获取 0-5 滞后的方差协方差矩阵n_lags <- 5L # 滞后数vcov <- 矩阵(0,nrow = n_lags + 1L,ncol = n_lags + 1)for (i in 0L:n_lags) {for (j in i:n_lags) {vcov[j + 1L, i + 1L] <-sum(mat[, (1L + (j - i)):(periods - i)] *垫 [, 1L:(句点 - j)])/(ind * (句点 - j) - 1)}}轮(vcov,3)[,1] [,2] [,3] [,4] [,5] [,6][1,] 1.001 0.000 0.000 0.000 0.000 0.000[2,] 0.000 1.001 0.000 0.000 0.000 0.000[3,] 0.000 0.000 1.001 0.000 0.000 0.000[4,] 0.000 0.000 0.000 1.001 0.000 0.000[5,] -0.001 0.000 0.000 0.000 1.001 0.000[6,] 0.000 -0.001 0.000 0.000 0.000 1.001
@F.Privé 的 Rcpp
实现是一个很好的起点,但我们可以做得更好.您会注意到在 OP 提供的主要算法中,有许多重复的相当昂贵的计算.观察:
OPalgo <- function(m, p, ind1, n) {vcov <- 矩阵(0,nrow = n + 1L,ncol = n + 1)for (i in 0L:n) {for (j in i:n) {## 第一个 & 的下限和上限范围第二个被乘数打印(粘贴(c((1L +(j - i)),:",(句点 - i),",1L,":",(句点 - j)), collapse = ""))vcov[j + 1L, i + 1L] <-sum(mat[, (1L + (j - i)):(periods - i)] *垫 [, 1L:(句点 - j)])/(ind * (句点 - j) - 1)}}病毒}OPalgo(垫子,句点,ind,n_lags)[1] "1:70 1:70" ## 包含 "1:65 1:65"[1] 2:70 1:69"[1] 3:70 1:68"[1] 4:70 1:67"[1] 5:70 1:66"[1]6:70 1:65"[1] "1:69 1:69" ## 包含 "1:65 1:65"[1] 2:69 1:68"[1] 3:69 1:67"[1] 4:69 1:66"[1] 5:69 1:65"[1] "1:68 1:68" ## 包含 "1:65 1:65"[1] 2:68 1:67"[1] "3:68 1:66"[1] 4:68 1:65"[1] "1:67 1:67" ## 包含 "1:65 1:65"[1] 2:67 1:66"[1] 3:67 1:65"[1] "1:66 1:66" ## 包含 "1:65 1:65"[1] 2:66 1:65"[1] "1:65 1:65"
如您所见,乘积 mat[,1:65] * mat[,1:65]
在上面执行了 6 次.第一次出现和最后一次出现之间的唯一区别是第一次出现有额外的 5 列.所以不是计算:
sum(mat[ , 1:70] * mat[ , 1:70])sum(mat[, 1:69] * mat[, 1:69])sum(mat[, 1:68] * mat[, 1:68])sum(mat[, 1:67] * mat[, 1:67])sum(mat[, 1:66] * mat[, 1:66])sum(mat[, 1:65] * mat[, 1:65])
我们可以计算 preCalc[1] <- sum(mat[ , 1:65] * mat[ , 1:65])
一次,然后在其他 5 次计算中使用它,像这样:
preCalc[1] + sum(mat[ , 66:70] * mat[ , 66:70])preCalc[1] + sum(mat[, 66:69] * mat[, 66:69])preCalc[1] + sum(mat[, 66:68] * mat[, 66:68])preCalc[1] + sum(mat[, 66:67] * mat[, 66:67])preCalc[1] + sum(mat[, 66:66] * mat[, 66:66])
在上面的每一个中,我们减少了90000 * 65 = 5,850,000
的乘法次数和5,850,000 - 1 = 5,849,999
的加法次数保存了 11,699,999
个算术运算.下面的函数实现了这一点.
fasterAlgo <- function(m, p, ind1, n) {vcov <- 矩阵(0,nrow = n + 1L,ncol = n + 1)preCals <- vapply(1:(n + 1L), function(x) sum(m[, x:(p - n + x - 2L)] *m[, 1L:(p - n - 1L)]), 42.42)for (i in 0L:n) {for (j in i:n) {myNum <- preCals[1L + j - i] + sum(m[, (p - n + j - i):(p - i)] * m[, (p - n):(p - j)])vcov[j + 1L, i + 1L] <- myNum/(ind * (p - j) - 1)}}病毒}## 输出相同的结果all.equal(OPalgo(mat, period, ind, n_lags), fastAlgo(mat, period, ind, n_lags))[1] 真
基准:
## 我在基准测试之前注释掉了 OPalgo 的打印语句图书馆(微基准)微基准(OP = OPalgo(垫子,句号,ind,n_lags),fastBase = fastAlgo(mat, period, ind, n_lags),RcppOrig = compute_vcov(mat, n_lags), times = 5)单位:毫秒expr min lq 平均中位数 uq max neval cldOP 2775.6110 2780.7207 2843.6012 2784.976 2899.7621 2976.9356 5 cfastBase 863.3897 863.9681 865.5576 865.593 866.7962 868.0409 5 bRcppOrig 160.1040 161.8922 162.0153 162.235 162.4756 163.3697 5 a
如您所见,通过此修改,我们看到至少有 3 倍的改进,但 Rcpp
仍然要快得多.让我们在Rcpp
中实现上面的概念.
//[[Rcpp::export]]NumericMatrix compute_vcov2(const NumericMatrix& mat, int n_lags) {NumericMatrix vcov(n_lags + 1, n_lags + 1);std::vector预计算;preCalcs.reserve(n_lags + 1);双 myCov;int i, j, k1, k2, l;int n = mat.nrow();int m = mat.ncol();for (i = 0; i <= n_lags; i++) {我的冠状病毒 = 0;对于 (k1 = i, k2 = 0; k2 <(m - n_lags - 1); k1++, k2++) {for (l = 0; l
新基准:
microbenchmark(OP = OPalgo(mat, period, ind, n_lags),fastBase = fastAlgo(mat, period, ind, n_lags),RcppOrig = compute_vcov(mat, n_lags),RcppModified = compute_vcov2(mat, n_lags), times = 5)单位:毫秒expr min lq 平均中位数 uq max neval cldOP 2785.4789 2786.67683 2811.02528 2789.37719 2809.61270 2883.98073 5 天fastBase 866.5601 868.25555 888.64418 869.31796 870.92308 968.16417 5 cRcppOrig 160.3467 161.37992 162.74899 161.73009 164.38653 165.90174 5 bRcpp 修改 51.1641 51.67149 52.87447 52.56067 53.06273 55.91334 5 a
现在增强的 Rcpp
解决方案比原始 Rcpp
解决方案快 3 倍左右,比 OP 提供的原始算法快 50 倍左右.
更新
我们可以做得更好.我们可以颠倒索引 i/j 的范围,从而不断更新 preCalcs
.这允许每次迭代最多只计算一个新列的乘积.随着 n_lags
的增加,这确实发挥了作用.观察:
//[[Rcpp::export]]NumericMatrix compute_vcov3(const NumericMatrix& mat, int n_lags) {NumericMatrix vcov(n_lags + 1, n_lags + 1);std::vector预计算;preCalcs.reserve(n_lags + 1);int i, j, k1, k2, l;int n = mat.nrow();int m = mat.ncol();for (i = 0; i <= n_lags; i++) {preCalcs.push_back(0);对于 (k1 = i, k2 = 0; k2 <(m - n_lags); k1++, k2++) {for (l = 0; l = 0; i--) { ## 反向范围for (j = n_lags; j >= i; j--) { ## 反向范围vcov(i, j) = vcov(j, i) = preCalcs[j - i]/(n * (m - j) - 1);如果 (i > 0 && i > 0) {对于 (k1 = m - i, k2 = m - j; k2 <= (m - j); k1++, k2++) {for (l = 0; l
Rcpp
仅基准:
n_lags <- 50L微基准(RcppOrig = compute_vcov(mat, n_lags),RcppModified = compute_vcov2(mat, n_lags),RcppExtreme = compute_vcov3(mat, n_lags), times = 5)单位:毫秒expr min lq 平均中位数 uq max neval cldRcppOrig 7035.7920 7069.7761 7083.4961 7070.3395 7119.028 7122.5446 5 cRcpp 修改 3608.8986 3645.8585 3653.0029 3654.7209 3663.716 3691.8202 5 bRcppExtreme 324.8252 330.7381 332.9657 333.5919 335.168 340.5054 5 a
最新的实现现在比原始 Rcpp
版本快 20 多倍,并且当 n-lags
很大时比原始算法快 300 多倍.
I'm looking for efficiency gains in calculating the (auto)covariance matrix from individual measurements over time t
with t, t-1
, etc..
In the data matrix, each row represents an individual and each column represents monthly measurements (the columns are in time order). Similar to the following data (although with some more co-variance).
# simulate data
set.seed(1)
periods <- 70L
ind <- 90000L
mat <- sapply(rep(ind, periods), rnorm)
Below is the (ugly) code I came up with to get the covariance matrix for measurements/ lagged measurements. It takes almost 4 seconds to run. I'm sure that by moving to data.table
, thinking more and not relying on loops I could cut the time by a big amount. But since covariance matrices are ubiquitous I suspect there already exists a standard (and efficient) way to do this in R that I should know about first.
# Get variance covariance matrix for 0-5 lags
n_lags <- 5L # Number of lags
vcov <- matrix(0, nrow = n_lags + 1L, ncol = n_lags + 1)
for (i in 0L:n_lags) {
for (j in i:n_lags) {
vcov[j + 1L, i + 1L] <-
sum(mat[, (1L + (j - i)):(periods - i)] *
mat[, 1L:(periods - j)]) /
(ind * (periods - j) - 1)
}
}
round(vcov, 3)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1.001 0.000 0.000 0.000 0.000 0.000
[2,] 0.000 1.001 0.000 0.000 0.000 0.000
[3,] 0.000 0.000 1.001 0.000 0.000 0.000
[4,] 0.000 0.000 0.000 1.001 0.000 0.000
[5,] -0.001 0.000 0.000 0.000 1.001 0.000
[6,] 0.000 -0.001 0.000 0.000 0.000 1.001
@F. Privé's Rcpp
implementation is a good starting place, but we can do better. You will notice in the main algorithm supplied by the OP that there are many replicated fairly expensive calculations. Observe:
OPalgo <- function(m, p, ind1, n) {
vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
for (i in 0L:n) {
for (j in i:n) {
## lower and upper range for the first & second multiplicand
print(paste(c((1L + (j - i)),":",(periods - i),"
",1L,":",(periods - j)), collapse = ""))
vcov[j + 1L, i + 1L] <-
sum(mat[, (1L + (j - i)):(periods - i)] *
mat[, 1L:(periods - j)]) /
(ind * (periods - j) - 1)
}
}
vcov
}
OPalgo(mat, periods, ind, n_lags)
[1] "1:70 1:70" ## contains "1:65 1:65"
[1] "2:70 1:69"
[1] "3:70 1:68"
[1] "4:70 1:67"
[1] "5:70 1:66"
[1] "6:70 1:65"
[1] "1:69 1:69" ## contains "1:65 1:65"
[1] "2:69 1:68"
[1] "3:69 1:67"
[1] "4:69 1:66"
[1] "5:69 1:65"
[1] "1:68 1:68" ## contains "1:65 1:65"
[1] "2:68 1:67"
[1] "3:68 1:66"
[1] "4:68 1:65"
[1] "1:67 1:67" ## contains "1:65 1:65"
[1] "2:67 1:66"
[1] "3:67 1:65"
[1] "1:66 1:66" ## contains "1:65 1:65"
[1] "2:66 1:65"
[1] "1:65 1:65"
As you can see, the product mat[,1:65] * mat[,1:65]
is performed 6 times above. The only difference between the first occurrence and the last occurrence is that the first occurrence has an additional 5 columns. So instead of computing:
sum(mat[ , 1:70] * mat[ , 1:70])
sum(mat[ , 1:69] * mat[ , 1:69])
sum(mat[ , 1:68] * mat[ , 1:68])
sum(mat[ , 1:67] * mat[ , 1:67])
sum(mat[ , 1:66] * mat[ , 1:66])
sum(mat[ , 1:65] * mat[ , 1:65])
We can compute preCalc[1] <- sum(mat[ , 1:65] * mat[ , 1:65])
one time and use this in the other 5 calculations like so:
preCalc[1] + sum(mat[ , 66:70] * mat[ , 66:70])
preCalc[1] + sum(mat[ , 66:69] * mat[ , 66:69])
preCalc[1] + sum(mat[ , 66:68] * mat[ , 66:68])
preCalc[1] + sum(mat[ , 66:67] * mat[ , 66:67])
preCalc[1] + sum(mat[ , 66:66] * mat[ , 66:66])
In each of the above, we have reduce the number of multiplications by 90000 * 65 = 5,850,000
and the number of additions by 5,850,000 - 1 = 5,849,999
for a total of 11,699,999
arithmetic operations saved. The function below achieves this very thing.
fasterAlgo <- function(m, p, ind1, n) {
vcov <- matrix(0, nrow = n + 1L, ncol = n + 1)
preCals <- vapply(1:(n + 1L), function(x) sum(m[ , x:(p - n + x - 2L)] *
m[ , 1L:(p - n - 1L)]), 42.42)
for (i in 0L:n) {
for (j in i:n) {
myNum <- preCals[1L + j - i] + sum(m[, (p - n + j - i):(p - i)] * m[, (p - n):(p - j)])
vcov[j + 1L, i + 1L] <- myNum / (ind * (p - j) - 1)
}
}
vcov
}
## outputs same results
all.equal(OPalgo(mat, periods, ind, n_lags), fasterAlgo(mat, periods, ind, n_lags))
[1] TRUE
Benchmarks:
## I commented out the print statements of the OPalgo before benchmarking
library(microbenchmark)
microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
fasterBase = fasterAlgo(mat, periods, ind, n_lags),
RcppOrig = compute_vcov(mat, n_lags), times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
OP 2775.6110 2780.7207 2843.6012 2784.976 2899.7621 2976.9356 5 c
fasterBase 863.3897 863.9681 865.5576 865.593 866.7962 868.0409 5 b
RcppOrig 160.1040 161.8922 162.0153 162.235 162.4756 163.3697 5 a
As you can see, with this modification we see at least a 3 fold improvement but the Rcpp
is still much faster. Let's implement the above concept in Rcpp
.
// [[Rcpp::export]]
NumericMatrix compute_vcov2(const NumericMatrix& mat, int n_lags) {
NumericMatrix vcov(n_lags + 1, n_lags + 1);
std::vector<double> preCalcs;
preCalcs.reserve(n_lags + 1);
double myCov;
int i, j, k1, k2, l;
int n = mat.nrow();
int m = mat.ncol();
for (i = 0; i <= n_lags; i++) {
myCov = 0;
for (k1 = i, k2 = 0; k2 < (m - n_lags - 1); k1++, k2++) {
for (l = 0; l < n; l++) {
myCov += mat(l, k1) * mat(l, k2);
}
}
preCalcs.push_back(myCov);
}
for (i = 0; i <= n_lags; i++) {
for (j = i; j <= n_lags; j++) {
myCov = preCalcs[j - i];
for (k1 = m - n_lags + j - i - 1, k2 = m - n_lags - 1; k2 < (m - j); k1++, k2++) {
for (l = 0; l < n; l++) {
myCov += mat(l, k1) * mat(l, k2);
}
}
myCov /= n * (m - j) - 1;
vcov(i, j) = vcov(j, i) = myCov;
}
}
return vcov;
}
## gives same results
all.equal(compute_vcov2(mat, n_lags), compute_vcov(mat, n_lags))
[1] TRUE
New benchmarks:
microbenchmark(OP = OPalgo(mat, periods, ind, n_lags),
fasterBase = fasterAlgo(mat, periods, ind, n_lags),
RcppOrig = compute_vcov(mat, n_lags),
RcppModified = compute_vcov2(mat, n_lags), times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
OP 2785.4789 2786.67683 2811.02528 2789.37719 2809.61270 2883.98073 5 d
fasterBase 866.5601 868.25555 888.64418 869.31796 870.92308 968.16417 5 c
RcppOrig 160.3467 161.37992 162.74899 161.73009 164.38653 165.90174 5 b
RcppModified 51.1641 51.67149 52.87447 52.56067 53.06273 55.91334 5 a
Now the enhanced Rcpp
solution is around 3x faster the original Rcpp
solution and around 50x faster than the original algorithm provided by the OP.
Update
We can do even better. We can reverse the ranges of the indices i/j so as to continuously update preCalcs
. This allows up to only compute the product of one new column every iteration. This really comes into play as n_lags
increases. Observe:
// [[Rcpp::export]]
NumericMatrix compute_vcov3(const NumericMatrix& mat, int n_lags) {
NumericMatrix vcov(n_lags + 1, n_lags + 1);
std::vector<double> preCalcs;
preCalcs.reserve(n_lags + 1);
int i, j, k1, k2, l;
int n = mat.nrow();
int m = mat.ncol();
for (i = 0; i <= n_lags; i++) {
preCalcs.push_back(0);
for (k1 = i, k2 = 0; k2 < (m - n_lags); k1++, k2++) {
for (l = 0; l < n; l++) {
preCalcs[i] += mat(l, k1) * mat(l, k2);
}
}
}
for (i = n_lags; i >= 0; i--) { ## reverse range
for (j = n_lags; j >= i; j--) { ## reverse range
vcov(i, j) = vcov(j, i) = preCalcs[j - i] / (n * (m - j) - 1);
if (i > 0 && i > 0) {
for (k1 = m - i, k2 = m - j; k2 <= (m - j); k1++, k2++) {
for (l = 0; l < n; l++) {
## updating preCalcs vector
preCalcs[j - i] += mat(l, k1) * mat(l, k2);
}
}
}
}
}
return vcov;
}
all.equal(compute_vcov(mat, n_lags), compute_vcov3(mat, n_lags))
[1] TRUE
Rcpp
benchmarks only:
n_lags <- 50L
microbenchmark(RcppOrig = compute_vcov(mat, n_lags),
RcppModified = compute_vcov2(mat, n_lags),
RcppExtreme = compute_vcov3(mat, n_lags), times = 5)
Unit: milliseconds
expr min lq mean median uq max neval cld
RcppOrig 7035.7920 7069.7761 7083.4961 7070.3395 7119.028 7122.5446 5 c
RcppModified 3608.8986 3645.8585 3653.0029 3654.7209 3663.716 3691.8202 5 b
RcppExtreme 324.8252 330.7381 332.9657 333.5919 335.168 340.5054 5 a
The newest implementation is now over 20x faster than the original Rcpp
version and well over 300x faster than the original algorithm when n-lags
is large.
这篇关于R中var-covar矩阵的高效计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!