与R中的data.table的累积计算(例如累积相关) [英] Cumulative Calculations (e.g. cumulative correlation) with data.table in R
本文介绍了与R中的data.table的累积计算(例如累积相关)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在R中,我有一个data.table,有两个测量红色
和绿色
累积相关性。
In R, I have a data.table with two measurements red
and green
and would like to calculate their cumulative correlation.
library(data.table)
DT <- data.table(red = c(1, 2, 3, 4, 5, 6.5, 7.6, 8.7),
green = c(2, 4, 6, 8, 10, 12, 14, 16),
id = 1:8)
如何在一个data.table命令中获取以下输出?
How can I get the following output within one data.table command?
...
> DT[1:5, cor(red, green)]
[1] 1 # should go into row 5
> DT[1:6, cor(red, green)]
[1] 0.9970501 # should go into row 6, and so on ...
> DT[1:7, cor(red, green)]
[1] 0.9976889
Edit:
我知道它可以通过循环来解决,但我的data.table有大约100万行分组成较小的块,所以循环相当慢,我认为可能一些其他可能性。
I am aware that it can be solved by looping, but my data.table has about 1 million rows grouped into smaller chunks, so looping is rather slow and I thought there might be some other possibility.
推荐答案
基于我对类似问题的回答这里的累积方差,可以找到累积协方差作为
Building on my answer to the similar question here for cumulative variances, you can find cumulative covariances as
library(dplyr) # for cummean
cum_cov <- function(x, y){
n <- 1:length(x)
res <- cumsum(x*y) - cummean(x)*cumsum(y) - cummean(y)*cumsum(x) + n*cummean(x)*cummean(y)
res / (n-1)
}
cum_var <- function(x){# copy-pasted from previous answer
n <- 1:length(x)
(cumsum(x^2) - n*cummean(x)^2) / (n-1)
}
累计相关性则为
cum_cor <- function(x, y) cum_cov(x, y)/sqrt(cum_var(x)*cum_var(y))
DT[, cumcor:=cum_cor(red, green),]
red green id cumcor
1: 1.0 2 1 NaN
2: 2.0 4 2 1.0000000
3: 3.0 6 3 1.0000000
4: 4.0 8 4 1.0000000
5: 5.0 10 5 1.0000000
6: 6.5 12 6 0.9970501
7: 7.6 14 7 0.9976889
8: 8.7 16 8 0.9983762
我希望速度够快
x <- rnorm(1e6)
y <- rnorm(1e6)+x
system.time(cum_cor(x, y))
# user system elapsed
# 0.319 0.020 0.339
这篇关于与R中的data.table的累积计算(例如累积相关)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文