滚动总和 [英] R dplyr rolling sum

查看：149 发布时间：2017/7/13 20:16:48 r dplyr

本文介绍了滚动总和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在通过dplyr实现滚动计算，但在我的数据库中，我有一些变量只有一个或只有一些观察结果，导致（k小于n）的错误。我已经尝试在这个例子中解决这个过滤器和合并，但是想知道是否有一种方法可以在dplyr中更优雅和自动地执行此操作。请参阅下面的示例

  #create data 
 dg = expand.grid（site = c（Boston 纽约），
年= 2000：2004）
 dg $ animal =dog
 dg $ animal [10] =cat; dg $ animal = as.factor dg $动物）
 dg $ count = rpois（dim（dg）[1]，5）

如果我将运行下面的代码，因为我只有一行与猫，一个得到（错误：k <= n不是真的）错误

  #running平均
 dg2 = dg％>％
安排（网站，年，动物）％>％
 group_by（site，动物）％>％
＃filter（animal ==dog）％>％
 mutate（roll_sum = rollsum（x = count，2，align =right，fill = ）

我已经尝试通过使用以下代码来解决这个问题，该代码过滤掉猫值并做一个后续的合并，但我想知道是否可以直接在dplyr内部这样做，特别是在这个解决方案中，必须指定/知道n预先为每个变量提供唯一行的数量，并手动调整，如果更改滚动总和的范围等。

  dg2 = dg％>％
安排（网站，年，动物）％>％
 group_by（网站，动物）％>％
过滤器（动物==狗）％ >％
 mutate（roll_sum = rollsum（x = count，2，align =right，fill = NA））
 
 merge（dg，dg2，c（site year，animal，count），all.x = TRUE）
 
网站年动物数量roll_sum 
 1波士顿2000狗5 NA 
 2波士顿2001狗6 11 
 3波士顿2002狗6 12 
 4波士顿2003年狗5 11 
 5波士顿2004年狗3 8 
 6纽约2000狗8 NA 
 7纽约2001狗3 11 
 8纽约2002年狗12 15 
 9纽约2003年狗3 15 
 10纽约2004年猫3 NA

非常感谢 - W

如果样本大小（ n RcppRoll :: roll_sum 返回NA c>）小于窗口大小（ k ）。

  set.seed（1）
 dg $ count = rpois（dim（dg）[1]，5） 
库（RcppRoll）
库（dplyr）
 dg％>％
安排（站点，年，动物）％>％
 group_by ）％>％
 mutate（roll_sum = roll_sum（count，2，align =right，fill = NA））
＃site year animal count roll_sum 
＃1波士顿2000狗4 NA 
＃2波士顿2001年狗5 9 
＃3波士顿2002年狗3 8 
＃4波士顿2003年狗9 12 
＃5波士顿2004年狗6 15 
＃ 6纽约2000狗4 NA 
＃7纽约2001年狗8 12 
＃8纽约2002年狗8 16 
＃9纽约2003年狗6 14 
＃10新约克2004猫2 NA

i am implementing a rolling sum calculation through dplyr, but in my database i have a number of variables that have only one or only a few observations, causing an (k is smaller than n) error. i have tried to resolve this in thisj example with filter and merge, but wondering if there is a way to do this more elegantly and automatically within dplyr. please see the example below

    #create data
    dg = expand.grid(site = c("Boston","New York"),
                     year = 2000:2004)
    dg$animal="dog"
    dg$animal[10]="cat";dg$animal=as.factor(dg$animal)
    dg$count = rpois(dim(dg)[1], 5)

If i would run the code below, because i only have one row with "cat", one gets the (Error: k <= n is not true) error

#running average
dg2 = dg %>%
  arrange(site,year,animal) %>%
  group_by(site,animal) %>%
#   filter(animal=="dog") %>%
  mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))

i have tried to solve this by using the following code, which filters out the "cat" value and does a subsequent merge, but I was wondering whether one can do this directly inside dplyr, especially as in this solution one would have to specify / know the number of unique rows for each variable in advance and adjust manually if one would change the range of the rolling sum, etc.

dg2 = dg %>%
  arrange(site,year,animal) %>%
  group_by(site,animal) %>%
  filter(animal=="dog") %>%
  mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))

merge(dg,dg2,c("site", "year","animal","count"),all.x=TRUE)

       site year animal count roll_sum
1    Boston 2000    dog     5       NA
2    Boston 2001    dog     6       11
3    Boston 2002    dog     6       12
4    Boston 2003    dog     5       11
5    Boston 2004    dog     3        8
6  New York 2000    dog     8       NA
7  New York 2001    dog     3       11
8  New York 2002    dog    12       15
9  New York 2003    dog     3       15
10 New York 2004    cat     3       NA

Many thanks - W

解决方案

You can instead use RcppRoll::roll_sum which returns NA if the sample size(n) is less than the window size(k).

set.seed(1)
dg$count = rpois(dim(dg)[1], 5) 
library(RcppRoll)
library(dplyr)
dg %>%
     arrange(site,year,animal) %>%
     group_by(site, animal) %>%
     mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA))    
#       site year animal count roll_sum
#1    Boston 2000    dog     4       NA
#2    Boston 2001    dog     5        9
#3    Boston 2002    dog     3        8
#4    Boston 2003    dog     9       12
#5    Boston 2004    dog     6       15
#6  New York 2000    dog     4       NA
#7  New York 2001    dog     8       12
#8  New York 2002    dog     8       16
#9  New York 2003    dog     6       14
#10 New York 2004    cat     2       NA

这篇关于滚动总和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

滚动总和 [英] R dplyr rolling sum

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

滚动总和 [英] R dplyr rolling sum

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭