对于data.frame中的每一行,获取R中前n个值的标准偏差 [英] For each row in a data.frame, get the standard deviation of the previous n values in R

查看:93
本文介绍了对于data.frame中的每一行,获取R中前n个值的标准偏差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力获取前n个值的标准偏差。或者我最近5天。

I am struggling to get the standard deviation of the previous n values. Or in my case the last 5 days.

我以下面的代码为例:

df<- data.frame(date = seq(as.Date("2019-12-01"), as.Date("2020-03-31"), by="days"),
                TRM= runif(122, min=3500, max=4100))
> df
          date      TRM
1   2019-12-01 3540.028
2   2019-12-02 3673.536
3   2019-12-03 3827.182
4   2019-12-04 3824.791
5   2019-12-05 3906.753
6   2019-12-06 3528.100
7   2019-12-07 3650.191
# ... with more rows

然后我使用 mutate 添加一些我需要的信息,将显示最后一行:

Then I use mutate to add some information that I need, I will show you the last rows:

df<-mutate(df, diferencia = TRM - lag(TRM, 1),
           VAR=diferencia/lag(TRM, 1))
>df
          date      TRM  diferencia          VAR
118 2020-03-27 3779.479 -262.366328 -0.064912515
119 2020-03-28 3773.771   -5.708207 -0.001510316
120 2020-03-29 4097.078  323.307069  0.085672159 
121 2020-03-30 3752.619 -344.459061 -0.084074332 
122 2020-03-31 3707.442  -45.176979 -0.012038788 

所以我需要的是以下内容:

So what I need is the following:


  1. 创建具有 sd 的列

  2. 每行的 sd 必须仅包含 VAR列的最后5天。

  3. 如果所有这些都可以通过 dply 完成,那就太好了。 (不是必需的)

  1. Create a column that have the sd for the column "VAR".
  2. That the sd for each row must contain only the last 5 days of the column "VAR".
  3. If all this could be done with dply, would be great. (Not necessary)

例如,对于第122行,结果为:

For example, for the row 122 the result would be this:

 > sd(df[118:122,4])
[1] 0.06630885

那又怎样我要得到的是 df 的所有行的此值,我以5天为例,但是我想修改范围:

So what I what to get is this value for all the rows of my df, I used 5 days as an example but I would like to modify the range:

          date      TRM  diferencia          VAR  diff5days
118 2020-03-27 3779.479 -262.366328 -0.064912515 0.05801765
119 2020-03-28 3773.771   -5.708207 -0.001510316 0.04799908
120 2020-03-29 4097.078  323.307069  0.085672159 0.06207932
121 2020-03-30 3752.619 -344.459061 -0.084074332 0.07522609
122 2020-03-31 3707.442  -45.176979 -0.012038788 0.06630885

谢谢!

推荐答案

这里是使用Base R的解决方案:

Here is a solution using Base R:

df<- data.frame(date = seq(as.Date("2019-12-01"), as.Date("2020-03-31"), by="days"),
                TRM= runif(122, min=3500, max=4100))
df$stDev <- NA

for(i in 5:nrow(df)) df$stDev[i] <- sd(df$TRM[(i - 4):i])

...以及输出:

> head(df,n = 10)
         date      TRM rownum     stDev
1  2019-12-01 3553.666      1        NA
2  2019-12-02 4054.015      2        NA
3  2019-12-03 3976.555      3        NA
4  2019-12-04 3825.628      4        NA
5  2019-12-05 4036.383      5 208.01581
6  2019-12-06 3787.414      6 122.38142
7  2019-12-07 3886.663      7 103.45743
8  2019-12-08 3930.801      8  97.10099
9  2019-12-09 3626.911      9 155.10571
10 2019-12-10 3781.731     10 117.29726
>

我们可以验证前三行的结果,如下所示:

We can verify the results for the first three rows as follows:

> # verify first three results
> sd(df$TRM[1:5])
[1] 208.0158
> sd(df$TRM[2:6])
[1] 122.3814
> sd(df$TRM[3:7])
[1] 103.4574
>

这篇关于对于data.frame中的每一行,获取R中前n个值的标准偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆