不包括当前观测值的累积平均数-在忽略NA的同时使用cummean和group_by [英] Cumulative mean non including the current observation - using cummean and group_by while ignoring NAs
本文介绍了不包括当前观测值的累积平均数-在忽略NA的同时使用cummean和group_by的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
我想在上述数据框中添加一个新列,该列将 value
列的累积均值取到上一个观测值(即不包括当前观测值)并且不考虑NA。我已经尝试过
I'd like to add a new column to the above dataframe which takes the cumulative mean of the value
column up to the prior observation (ie not including the current observation) and not taking into account NAs. I've tried
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = ifelse(isna, NA, cummean(lag(value))))
但 cummean
只是不知道如何处理NA,不幸的是 lag
会生成它们。
but cummean
just doesn't know what to do with NAs and unfortunately lag
generates them.
我不想将NA数为0。
推荐答案
可以先锻炼 cummean
,然后再锻炼 lag
。
One can workout first cummean
and then take lag
of the same.
library(dplyr)
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = lag(cummean(value))) %>%
ungroup() %>%
select(-isna)
# # A tibble: 8 x 3
# category value new_col
# <fctr> <dbl> <dbl>
# 1 cat1 NA NA
# 2 cat1 2.00 NA
# 3 cat2 3.00 NA
# 4 cat1 4.00 2.00
# 5 cat2 5.00 3.00
# 6 cat2 NA NA
# 7 cat1 7.00 3.00
# 8 cat2 8.00 4.00
这篇关于不包括当前观测值的累积平均数-在忽略NA的同时使用cummean和group_by的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文