使用聚合将na.omit和na.pass混合在一起? [英] Blend of na.omit and na.pass using aggregate?

查看:118
本文介绍了使用聚合将na.omit和na.pass混合在一起?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含产品原型测试数据的数据集.并非所有测试都在所有批次上运行,并且并非所有测试都以相同的样本量执行.为了说明这一点,请考虑以下情况:

I have a data set containing product prototype test data. Not all tests were run on all lots, and not all tests were executed with the same sample sizes. To illustrate, consider this case:

> test <- data.frame(name = rep(c("A", "B", "C"), each = 4),
  var1 = rep(c(1:3, NA), 3),
  var2 = 1:12,
  var3 = c(rep(NA, 4), 1:8))

> test
   name var1 var2 var3
1     A    1    1   NA
2     A    2    2   NA
3     A    3    3   NA
4     A   NA    4   NA
5     B    1    5    1
6     B    2    6    2
7     B    3    7    3
8     B   NA    8    4
9     C    1    9    5
10    C    2   10    6
11    C    3   11    7
12    C   NA   12    8

过去,我只需要处理重复不匹配的情况,而使用aggregate(cbind(var1, var2) ~ name, test, FUN = mean, na.action = na.omit)(或默认设置)则很容易.我将获得超过var1的三个值和var2的四个值的每个批次的平均值.

In the past, I've only had to deal with cases of mis-matched repetitions, which has been easy with aggregate(cbind(var1, var2) ~ name, test, FUN = mean, na.action = na.omit) (or the default setting). I'll get averages for each lot over three values for var1 and over four values for var2.

不幸的是,在这种情况下,这将使我的数据集完全丢失很多A:

Unfortunately, this will leave me with a dataset completely missing lot A in this case:

 aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.omit)
  name var1 var2 var3
1    B    2    6    2
2    C    2   10    6

但是,如果我使用na.pass,我也不会得到想要的东西:

If I use na.pass, however, I also don't get what I want:

 aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.pass)
  name var1 var2 var3
1    A   NA  2.5   NA
2    B   NA  6.5  2.5
3    C   NA 10.5  6.5

现在,我丢失了var1中的优质数据,因为其中包含NA的实例.

Now I lose the good data I had in var1 since it contained instances of NA.

我想要的是:

    如果> name all 个唯一组合都是NA s ,则
  • NA作为mean()的输出
  • mean()的输出,如果varNname
  • 有一个或多个实际值
  • NA as the output of mean() if all unique combinations of varN ~ name are NAs
  • Output of mean() if there are one or more actual values for varN ~ name

我猜这很简单,但是我不知道怎么做.我需要使用ddply这样的东西吗?如果是这样...我倾向于避免的原因是,我最终像aggregate()那样写了很长的等值内容:

I'm guessing this is pretty simple, but I just don't know how. Do I need to use ddply for something like this? If so... the reason I tend to avoid it is that I end up writing really long equivalents to aggregate() like so:

ddply(test, .(name), summarise,
  var1 = mean(var1, na.rm = T),
  var2 = mean(var2, na.rm = T),
  var3 = mean(var3, na.rm = T))

是的...这样做的结果显然是我想要的.无论如何,我都会留下这个问题,以防万一有一种方法可以使用aggregate()或2)较短的语法来实现ddply.

Yeah... so the result of that apparently does what I want. I'll leave the question anyway in case there's 1) a way to do this with aggregate() or 2) shorter syntax for ddply.

推荐答案

两者 na.action=na.passna.rm=TRUE都传递给aggregate.前者告诉aggregate不要删除存在NA的行.后者告诉mean忽略它们.

Pass both na.action=na.pass and na.rm=TRUE to aggregate. The former tells aggregate not to delete rows where NAs exist; and the latter tells mean to ignore them.

aggregate(cbind(var1, var2, var3) ~ name, test, mean,
          na.action=na.pass, na.rm=TRUE)

这篇关于使用聚合将na.omit和na.pass混合在一起?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆