使用na.action = na.pass进行聚合会给出意外的答案 [英] Aggregate with na.action=na.pass gives unexpected answer

查看:457
本文介绍了使用na.action = na.pass进行聚合会给出意外的答案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以以下data.frame为例:

I use the following data.frame as an example:

d <- data.frame(x=c(1,NA), y=c(2,3))

我想用变量x对y的值求和.由于没有x的共同值,我希望聚合只将原始data.frame返回给我,其中NA被视为一个组.但是聚合为我提供了以下结果.

I'd like to sum up the values of y by the variable x. Since there is no common value of x, I would expect aggregation to just give me the original data.frame back, where NA is treated as a group. But aggregation gives me the following results.

>aggregate(y ~ x, data=d, FUN=sum)
  x y
1 1 2

我已经阅读了有关更改na.action的默认操作的文档,但似乎没有任何意义.

I've read the documentation about changing the default actions of na.action, but it doesn't seem to give me anything meaningful.

>aggregate(y ~ x, data=d, FUN=sum, na.action=na.pass)
  x y
1 1 2

这是怎么回事?在这种情况下,我似乎不了解na.pass在做什么.是否可以选择在R中完成我想要的?任何帮助将不胜感激.

What is going on? I don't seem to understand what na.pass is doing in this case. Is there an option to accomplish what I want in R? Any help would be greatly appreciated.

推荐答案

aggregate使用tapply,这反过来在其分组变量上使用factor.

aggregate makes use of tapply, which in turn makes use of factor on its grouping variable.

但是,看看factor中的NA值会发生什么:

But, look at what happens with NA values in factor:

factor(c(1, 2, NA))
# [1] 1    2    <NA>
# Levels: 1 2

请注意levels.您可以使用addNA保留NA:

Note the levels. You can make use of addNA to keep the NA:

addNA(factor(c(1, 2, NA)))
# [1] 1    2    <NA>
# Levels: 1 2 <NA>

因此,您可能需要执行以下操作:

Thus, you would probably need to do something like:

aggregate(y ~ addNA(x), d, sum)
#   addNA(x) y
# 1        1 2
# 2     <NA> 3

或类似的东西

d$x <- addNA(factor(d$x))
str(d)
# 'data.frame': 2 obs. of  2 variables:
#  $ x: Factor w/ 2 levels "1",NA: 1 2
#  $ y: num  2 3
aggregate(y ~ x, d, sum)
#      x y
# 1    1 2
# 2 <NA> 3


(或者,升级到"data.table"之类的东西,不仅会比aggregate快,而且还会使您使用NA值获得更一致的行为.无需注意是否使用aggregate的公式方法.)


(Alternatively, make the upgrade to something like "data.table", which will not just be faster than aggregate, but which will also give you more consistent behavior with NA values. No need to pay heed to whether you're using the formula method of aggregate or not.)

library(data.table)
as.data.table(d)[, sum(y), by = x]
#     x V1
# 1:  1  2
# 2: NA  3

这篇关于使用na.action = na.pass进行聚合会给出意外的答案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆