使用聚合将na.omit和na.pass混合在一起? [英] Blend of na.omit and na.pass using aggregate?

查看：118 发布时间：2020/5/28 20:24:28 r aggregate plyr summary

本文介绍了使用聚合将na.omit和na.pass混合在一起?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含产品原型测试数据的数据集.并非所有测试都在所有批次上运行，并且并非所有测试都以相同的样本量执行.为了说明这一点，请考虑以下情况:

I have a data set containing product prototype test data. Not all tests were run on all lots, and not all tests were executed with the same sample sizes. To illustrate, consider this case:

> test <- data.frame(name = rep(c("A", "B", "C"), each = 4),
  var1 = rep(c(1:3, NA), 3),
  var2 = 1:12,
  var3 = c(rep(NA, 4), 1:8))

> test
   name var1 var2 var3
1     A    1    1   NA
2     A    2    2   NA
3     A    3    3   NA
4     A   NA    4   NA
5     B    1    5    1
6     B    2    6    2
7     B    3    7    3
8     B   NA    8    4
9     C    1    9    5
10    C    2   10    6
11    C    3   11    7
12    C   NA   12    8

过去，我只需要处理重复不匹配的情况，而使用aggregate(cbind(var1, var2) ~ name, test, FUN = mean, na.action = na.omit)(或默认设置)则很容易.我将获得超过var1的三个值和var2的四个值的每个批次的平均值.

In the past, I've only had to deal with cases of mis-matched repetitions, which has been easy with aggregate(cbind(var1, var2) ~ name, test, FUN = mean, na.action = na.omit) (or the default setting). I'll get averages for each lot over three values for var1 and over four values for var2.

不幸的是，在这种情况下，这将使我的数据集完全丢失很多A:

Unfortunately, this will leave me with a dataset completely missing lot A in this case:

 aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.omit)
  name var1 var2 var3
1    B    2    6    2
2    C    2   10    6

但是，如果我使用na.pass，我也不会得到想要的东西:

If I use na.pass, however, I also don't get what I want:

 aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.pass)
  name var1 var2 var3
1    A   NA  2.5   NA
2    B   NA  6.5  2.5
3    C   NA 10.5  6.5

现在，我丢失了var1中的优质数据，因为其中包含NA的实例.

Now I lose the good data I had in var1 since it contained instances of NA.

我想要的是:

> name的 all 个唯一组合都是NA s ，则

NA作为mean()的输出

mean()的输出，如果varN〜name

有一个或多个实际值

NA as the output of mean() if all unique combinations of varN ~ name are NAs
Output of mean() if there are one or more actual values for varN ~ name

我猜这很简单，但是我不知道怎么做.我需要使用ddply这样的东西吗?如果是这样...我倾向于避免的原因是，我最终像aggregate()那样写了很长的等值内容:

I'm guessing this is pretty simple, but I just don't know how. Do I need to use ddply for something like this? If so... the reason I tend to avoid it is that I end up writing really long equivalents to aggregate() like so:

ddply(test, .(name), summarise,
  var1 = mean(var1, na.rm = T),
  var2 = mean(var2, na.rm = T),
  var3 = mean(var3, na.rm = T))

是的...这样做的结果显然是我想要的.无论如何，我都会留下这个问题，以防万一有一种方法可以使用aggregate()或2)较短的语法来实现ddply.

Yeah... so the result of that apparently does what I want. I'll leave the question anyway in case there's 1) a way to do this with aggregate() or 2) shorter syntax for ddply.

使用聚合将na.omit和na.pass混合在一起? [英] Blend of na.omit and na.pass using aggregate?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用聚合将na.omit和na.pass混合在一起? [英] Blend of na.omit and na.pass using aggregate?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭