滚动平均值/标准偏差(带条件) [英] Rolling Mean/standard deviation with conditions

查看:211
本文介绍了滚动平均值/标准偏差(带条件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于基于条件计算滚动平均值/标准偏差,我有一个问题。老实说,它更多是一个语法问题,但是由于我认为这使我的代码变慢了很多,所以我认为我应该在这里要求它找出正在发生的事情。我有一些财务数据,其列如股票名称中间报价等,我想计算滚动平均值

I have a bit of a question about computing the Rolling Mean/standard deviation based on conditions. To be honest it is more of a syntax question, but since I think it is slowing down my code quite a bit I thought I should ask it here to find out what's going on. I have some finance data with columns such as Stock Name, Midquotes etc. and I would like to compute the rolling mean and rolling standard deviation based on the stock.

现在我想计算每只股票的波动率,这是通过取前20个引号。为此,在搜索了stackoverflow论坛之后,我使用 data.table 包找到了一行,如下所示:

Right now I wish to compute the volatility of each stock, and this is done by taking the rolling standard deviation of the previous 20 midquotes. To this end, after searching through the stackoverflow forums, I found a line using the data.table package as follows:

DT[, volatility:=( roll_sd(DT$Midquotes, 20, fill=0, align = "right") ), by = Stock]

其中 DT data.table 包含我的所有数据。

Where DT is the data.table which contains all my data.

现在,这在计算上相当慢,尤其是当我将其与没有给出任何条件的典型滚动标准偏差计算进行比较时此处:

Now, this is quite computationally slow, especially when I compare it to a typical rolling standard deviation calculation without any conditions as given here:

DT$volatility <- roll_sd(DT$Midquotes, 20, fill=0, align = "right")

但是当我尝试对带有条件的滚动标准偏差执行类似操作时,R不会让我可以这样做:

But when I try to do something similar with the rolling standard deviation with a condition, R will not let me do this:

DT$volatility <- DT[, ( roll_sd(DT$Midquotes, 20, fill=0, align = "right") ), by = Stock]

此行出现错误:

Error: cannot allocate vector of size 10.9 Gb


$ b的向量$ b

所以我只是想知道,为什么这行: DT [,volatility:=(roll_sd(DT $ Midquotes,20,fill = 0,align = right))), =股票] 这么慢吗?每次为不同的股票计算滚动标准偏差时,是否可能会复制整个 data.table

So I was just wondering, why is this line: DT[, volatility:=( roll_sd(DT$Midquotes, 20, fill=0, align = "right") ), by = Stock] so slow? Is it perhaps making a copy of the entire data.table each time the rolling standard deviation is computed for each different stock?

推荐答案

我认为您的问题是您使用了:= 函数并在方括号内使用 DT 。我认为您的设置类似于:

I think your problem is your use of the := function and that you use DT inside the square brackets. I assume your setup is something like:

> library(data.table)
> set.seed(83385668)
> DT <- data.table(
+   x     = rnorm(5 * 3), 
+   stock = c(sapply(letters[1:3], rep, times = 5)),
+   time  = c(replicate(3, 1:5)))
> DT
              x stock time
 1:  0.25073356     a    1
 2: -0.24408170     a    2
 3: -0.87475856     a    3
 4:  0.50843761     a    4
 5: -1.91331773     a    5
 6:  0.07850094     b    1
 7: -0.15922989     b    2
 8:  1.09806870     b    3
 9:  0.27995610     b    4
10:  0.45090842     b    5
11:  0.03400554     c    1
12: -0.34918734     c    2
13:  2.16602740     c    3
14: -0.04758261     c    4
15:  1.24869663     c    5

我不确定 roll_sd 函数的来源。但是,您可以计算 zoo 库的滚动平均值如下:

I am not sure where the roll_sd function is from. However, you can compute e.g. a rolling mean with the zoo library as follows:

> library(zoo)
> setkey(DT, stock, time) # make sure data is sorted by time
> DT[, rollmean := rollmean(x, k = 3, fill = 0, align = "right"), 
+    by = .(stock)]
> DT
              x stock time   rollmean
 1:  0.25073356     a    1  0.0000000
 2: -0.24408170     a    2  0.0000000
 3: -0.87475856     a    3 -0.2893689
 4:  0.50843761     a    4 -0.2034676
 5: -1.91331773     a    5 -0.7598796
 6:  0.07850094     b    1  0.0000000
 7: -0.15922989     b    2  0.0000000
 8:  1.09806870     b    3  0.3391132
 9:  0.27995610     b    4  0.4062650
10:  0.45090842     b    5  0.6096444
11:  0.03400554     c    1  0.0000000
12: -0.34918734     c    2  0.0000000
13:  2.16602740     c    3  0.6169485
14: -0.04758261     c    4  0.5897525
15:  1.24869663     c    5  1.1223805

或等价

> DT[, `:=`(rollmean = rollmean(x, k = 3, fill = 0, align = "right")), 
+    by = .(stock)]
> DT
              x stock time   rollmean
 1:  0.25073356     a    1  0.0000000
 2: -0.24408170     a    2  0.0000000
 3: -0.87475856     a    3 -0.2893689
 4:  0.50843761     a    4 -0.2034676
 5: -1.91331773     a    5 -0.7598796
 6:  0.07850094     b    1  0.0000000
 7: -0.15922989     b    2  0.0000000
 8:  1.09806870     b    3  0.3391132
 9:  0.27995610     b    4  0.4062650
10:  0.45090842     b    5  0.6096444
11:  0.03400554     c    1  0.0000000
12: -0.34918734     c    2  0.0000000
13:  2.16602740     c    3  0.6169485
14: -0.04758261     c    4  0.5897525
15:  1.24869663     c    5  1.1223805

这篇关于滚动平均值/标准偏差(带条件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆