在group_by上计算均值(summarize_each)时处理NA [英] Dealing with NAs when calculating mean (summarize_each) on group_by

查看:500
本文介绍了在group_by上计算均值(summarize_each)时处理NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框md:

md <- data.frame(x = c(3,5,4,5,3,5), y = c(5,5,5,4,4,1), z = c(1,3,4,3,5,5),
      device1 = c("c","a","a","b","c","c"), device2 = c("B","A","A","A","B","B"))
md[2,3] <- NA
md[4,1] <- NA
md

我想使用dplyr通过device1/device2组合计算均值:

I want to calculate means by device1 / device2 combinations using dplyr:

library(dplyr)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean))

但是,我得到了一些NA.我希望NA被忽略(na.rm = TRUE)-我试过了,但是该函数不想接受此参数. 这两行都会导致错误:

However, I am getting some NAs. I want the NAs to be ignored (na.rm = TRUE) - I tried, but the function doesn't want to accept this argument. Both these lines result in error:

md %>% group_by(device1, device2) %>% summarise_each(funs(mean), na.rm = TRUE)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean, na.rm = TRUE))

推荐答案

其他答案显示了将mean(., na.rm = TRUE)传递到summarize/_each的语法.

The other answers showed you the syntax for passing mean(., na.rm = TRUE) into summarize/_each.

我个人经常处理这个问题,这很烦人,我只定义了以下方便的NA感知基本功能集(例如,在我的.Rprofile中),以便您可以应用它们使用dplyr和summarize(mean_)并且没有讨厌的arg-passing;还可以使源代码更整洁,更具可读性,这是另一个强项:

Personally, I deal with this so often and it's so annoying that I just define the following convenience set of NA-aware basic functions (e.g. in my .Rprofile), such that you can apply them with dplyr with summarize(mean_) and no pesky arg-passing; also keeps the source-code cleaner and more readable, which is another strong plus:

mean_   <- function(...) mean(..., na.rm=T)
median_ <- function(...) median(..., na.rm=T)
sum_    <- function(...) sum(..., na.rm=T)
sd_     <- function(v)   sqrt(sum_((v-mean_(v))^2) / length(v))
cor_    <- function(...) cor(..., use='pairwise.complete.obs')
table_  <- function(...) table(..., useNA='ifany')
mode_   <- function(...) {
  tab <- table(...)
  names(tab[tab==max(tab)]) # the '==' implicitly excludes NA values
}
clamp_  <- function(..., minval=0, maxval=70) pmax(minval, pmin(maxval,...))

您真的希望能够一劳永逸地挥动一个全局开关,例如na.action/na.pass/na.omit/na.fail告诉函数默认行为该怎么做,而不是像现在那样在不同的程序包之间抛出错误或不一致.

Really you want to be able to flick one global switch once and for all, like na.action/na.pass/na.omit/na.fail to tell functions as default behavior what to do, and not throw errors or be inconsistent, as they currently do, across different packages.

以前有一个名为Defaults的CRAN软件包,用于设置每个功能的默认值,但自2014年以来一直未维护.有关详细信息,请根据项目特定基础设置功能默认值R

There used to be a CRAN package called Defaults for setting per-function defaults but it is not maintained since 2014, pre-3.x . For more about it Setting Function Defaults R on a Project Specific Basis

这篇关于在group_by上计算均值(summarize_each)时处理NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆