R中每个周期的计算方差 [英] Compute Variance per Period in R

查看:415
本文介绍了R中每个周期的计算方差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一组如下所示的数据:

  team running_scored date 
LAN 3 2014-03-22
ARI 1 2014-03-22
LAN 7 2014-03-23
ARI 5 2014-03-23
LAN 1 2014-03-30
SDN 3 2014-03-30

我试图测试一个预测模型该集合和其中一个输入参数是t-1中 runs_scored 的方差。换句话说,为了预测第四次观察的结果变量,我需要基于数据集中先前观察值的 LAN 的方差。



我可以计算累积的平均值和总和,但是我不太清楚如何计算数据集中的累积方差。我在 dplyr 中进行大部分的数据处理,但是如果我需要我需要的话,我不反对使用另外的解决方案。

$ b $写出方差公式为(sum(x ^ 2)-length(x)* mean(x)^ 2)b

解决方案

/(length(x)-1),你可以看到它可以很容易地推广到累积差异,只需用其累积版本替换其中的每个函数( cummean 来自 dplyr )。因此,

  library(dplyr)
cum_var< - function(x){
n < 1:length(x)
(cumsum(x ^ 2)-n * cummean(x)^ 2)/(n-1)
}
/ pre>

与@ MrFlick的 cumvar 的速度比较似乎令人鼓舞。

  x<  -  rnorm(1e6)
all.equal(cum_var(x),cumvar(x))
#[1] b $ b system.time(cumvar(x))[3]
已经
5.52
system.time(cum_var(x))[3]
已过
0.04


I'm working with a set of data that looks like the following:

team runs_scored       date
LAN           3        2014-03-22
ARI           1        2014-03-22
LAN           7        2014-03-23
ARI           5        2014-03-23
LAN           1        2014-03-30
SDN           3        2014-03-30

I'm trying to test a predictive model on this set and one of the input parameters is the variance of runs_scored in t-1. In other words, to predict the outcome variable for the 4th observation, I need the variance of LAN based on the prior observations in the dataset.

I can compute cumulative means and sums, but I can't quite figure out how to compute the cumulative variance in the data set. I'm doing most of my data manipulation in dplyr, but I'm not opposed to using an alternative solution if it gets me what I need

解决方案

Writing out variance formula as, (sum(x^2)-length(x)*mean(x)^2)/(length(x)-1), you see that it can be easily generalized to cumulative variances, just by replacing each functions in it by its cumulative versions(cummean is from dplyr). So,

library(dplyr)
cum_var <- function(x){
    n <- 1:length(x)
    (cumsum(x^2)-n*cummean(x)^2)/(n-1)
}

And speed comparison to @MrFlick's cumvar seems encouraging.

x <- rnorm(1e6)
all.equal(cum_var(x), cumvar(x))
#[1] TRUE
system.time(cumvar(x))[3]
elapsed 
   5.52 
system.time(cum_var(x))[3]
elapsed 
   0.04 

这篇关于R中每个周期的计算方差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆