滚动总和 [英] R dplyr rolling sum
问题描述
#create data
dg = expand.grid(site = c(Boston 纽约),
年= 2000:2004)
dg $ animal =dog
dg $ animal [10] =cat; dg $ animal = as.factor dg $动物)
dg $ count = rpois(dim(dg)[1],5)
如果我将运行下面的代码,因为我只有一行与猫,一个得到(错误:k <= n不是真的)错误
#running平均
dg2 = dg%>%
安排(网站,年,动物)%>%
group_by(site,动物)%>%
#filter(animal ==dog)%>%
mutate(roll_sum = rollsum(x = count,2,align =right,fill = )
我已经尝试通过使用以下代码来解决这个问题,该代码过滤掉猫值并做一个后续的合并,但我想知道是否可以直接在dplyr内部这样做,特别是在这个解决方案中,必须指定/知道n预先为每个变量提供唯一行的数量,并手动调整,如果更改滚动总和的范围等。
dg2 = dg%>%
安排(网站,年,动物)%>%
group_by(网站,动物)%>%
过滤器(动物==狗)% >%
mutate(roll_sum = rollsum(x = count,2,align =right,fill = NA))
merge(dg,dg2,c(site year,animal,count),all.x = TRUE)
网站年动物数量roll_sum
1波士顿2000狗5 NA
2波士顿2001狗6 11
3波士顿2002狗6 12
4波士顿2003年狗5 11
5波士顿2004年狗3 8
6纽约2000狗8 NA
7纽约2001狗3 11
8纽约2002年狗12 15
9纽约2003年狗3 15
10纽约2004年猫3 NA
非常感谢 - W
n $ c $),您可以使用 RcppRoll :: roll_sum
返回NA c>)小于窗口大小( k
)。 set.seed(1)
dg $ count = rpois(dim(dg)[1],5)
库(RcppRoll)
库(dplyr)
dg%>%
安排(站点,年,动物)%>%
group_by )%>%
mutate(roll_sum = roll_sum(count,2,align =right,fill = NA))
#site year animal count roll_sum
#1波士顿2000狗4 NA
#2波士顿2001年狗5 9
#3波士顿2002年狗3 8
#4波士顿2003年狗9 12
#5波士顿2004年狗6 15
# 6纽约2000狗4 NA
#7纽约2001年狗8 12
#8纽约2002年狗8 16
#9纽约2003年狗6 14
#10新约克2004猫2 NA
i am implementing a rolling sum calculation through dplyr, but in my database i have a number of variables that have only one or only a few observations, causing an (k is smaller than n) error. i have tried to resolve this in thisj example with filter and merge, but wondering if there is a way to do this more elegantly and automatically within dplyr. please see the example below
#create data
dg = expand.grid(site = c("Boston","New York"),
year = 2000:2004)
dg$animal="dog"
dg$animal[10]="cat";dg$animal=as.factor(dg$animal)
dg$count = rpois(dim(dg)[1], 5)
If i would run the code below, because i only have one row with "cat", one gets the (Error: k <= n is not true) error
#running average
dg2 = dg %>%
arrange(site,year,animal) %>%
group_by(site,animal) %>%
# filter(animal=="dog") %>%
mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))
i have tried to solve this by using the following code, which filters out the "cat" value and does a subsequent merge, but I was wondering whether one can do this directly inside dplyr, especially as in this solution one would have to specify / know the number of unique rows for each variable in advance and adjust manually if one would change the range of the rolling sum, etc.
dg2 = dg %>%
arrange(site,year,animal) %>%
group_by(site,animal) %>%
filter(animal=="dog") %>%
mutate(roll_sum = rollsum(x = count, 2, align = "right", fill = NA))
merge(dg,dg2,c("site", "year","animal","count"),all.x=TRUE)
site year animal count roll_sum
1 Boston 2000 dog 5 NA
2 Boston 2001 dog 6 11
3 Boston 2002 dog 6 12
4 Boston 2003 dog 5 11
5 Boston 2004 dog 3 8
6 New York 2000 dog 8 NA
7 New York 2001 dog 3 11
8 New York 2002 dog 12 15
9 New York 2003 dog 3 15
10 New York 2004 cat 3 NA
Many thanks - W
解决方案 You can instead use RcppRoll::roll_sum
which returns NA if the sample size(n
) is less than the window size(k
).
set.seed(1)
dg$count = rpois(dim(dg)[1], 5)
library(RcppRoll)
library(dplyr)
dg %>%
arrange(site,year,animal) %>%
group_by(site, animal) %>%
mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA))
# site year animal count roll_sum
#1 Boston 2000 dog 4 NA
#2 Boston 2001 dog 5 9
#3 Boston 2002 dog 3 8
#4 Boston 2003 dog 9 12
#5 Boston 2004 dog 6 15
#6 New York 2000 dog 4 NA
#7 New York 2001 dog 8 12
#8 New York 2002 dog 8 16
#9 New York 2003 dog 6 14
#10 New York 2004 cat 2 NA
这篇关于滚动总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!