不包括当前观测值的累积平均数-在忽略NA的同时使用cummean和group_by [英] Cumulative mean non including the current observation - using cummean and group_by while ignoring NAs

查看:83
本文介绍了不包括当前观测值的累积平均数-在忽略NA的同时使用cummean和group_by的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

我想在上述数据框中添加一个新列,该列将 value 列的累积均值取到上一个观测值(即不包括当前观测值)并且不考虑NA。我已经尝试过

I'd like to add a new column to the above dataframe which takes the cumulative mean of the value column up to the prior observation (ie not including the current observation) and not taking into account NAs. I've tried

df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = ifelse(isna, NA, cummean(lag(value))))

cummean 只是不知道如何处理NA,不幸的是 lag 会生成它们。

but cummean just doesn't know what to do with NAs and unfortunately lag generates them.

我不想将NA数为0。

推荐答案

可以先锻炼 cummean ,然后再锻炼 lag

One can workout first cummean and then take lag of the same.

library(dplyr)
df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = lag(cummean(value))) %>%
  ungroup() %>%
  select(-isna)


# # A tibble: 8 x 3
# category value new_col
# <fctr>   <dbl>   <dbl>
# 1 cat1     NA      NA   
# 2 cat1      2.00   NA   
# 3 cat2      3.00   NA   
# 4 cat1      4.00    2.00
# 5 cat2      5.00    3.00
# 6 cat2     NA      NA   
# 7 cat1      7.00    3.00
# 8 cat2      8.00    4.00

这篇关于不包括当前观测值的累积平均数-在忽略NA的同时使用cummean和group_by的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆