使用na.action = na.pass进行聚合会给出意外的答案 [英] Aggregate with na.action=na.pass gives unexpected answer
问题描述
我以以下data.frame为例:
I use the following data.frame as an example:
d <- data.frame(x=c(1,NA), y=c(2,3))
我想用变量x对y的值求和.由于没有x的共同值,我希望聚合只将原始data.frame返回给我,其中NA被视为一个组.但是聚合为我提供了以下结果.
I'd like to sum up the values of y by the variable x. Since there is no common value of x, I would expect aggregation to just give me the original data.frame back, where NA is treated as a group. But aggregation gives me the following results.
>aggregate(y ~ x, data=d, FUN=sum)
x y
1 1 2
我已经阅读了有关更改na.action的默认操作的文档,但似乎没有任何意义.
I've read the documentation about changing the default actions of na.action, but it doesn't seem to give me anything meaningful.
>aggregate(y ~ x, data=d, FUN=sum, na.action=na.pass)
x y
1 1 2
这是怎么回事?在这种情况下,我似乎不了解na.pass在做什么.是否可以选择在R中完成我想要的?任何帮助将不胜感激.
What is going on? I don't seem to understand what na.pass is doing in this case. Is there an option to accomplish what I want in R? Any help would be greatly appreciated.
推荐答案
aggregate
使用tapply
,这反过来在其分组变量上使用factor
.
aggregate
makes use of tapply
, which in turn makes use of factor
on its grouping variable.
但是,看看factor
中的NA
值会发生什么:
But, look at what happens with NA
values in factor
:
factor(c(1, 2, NA))
# [1] 1 2 <NA>
# Levels: 1 2
请注意levels
.您可以使用addNA
保留NA
:
Note the levels
. You can make use of addNA
to keep the NA
:
addNA(factor(c(1, 2, NA)))
# [1] 1 2 <NA>
# Levels: 1 2 <NA>
因此,您可能需要执行以下操作:
Thus, you would probably need to do something like:
aggregate(y ~ addNA(x), d, sum)
# addNA(x) y
# 1 1 2
# 2 <NA> 3
或类似的东西
d$x <- addNA(factor(d$x))
str(d)
# 'data.frame': 2 obs. of 2 variables:
# $ x: Factor w/ 2 levels "1",NA: 1 2
# $ y: num 2 3
aggregate(y ~ x, d, sum)
# x y
# 1 1 2
# 2 <NA> 3
(或者,升级到"data.table"之类的东西,不仅会比aggregate
快,而且还会使您使用NA
值获得更一致的行为.无需注意是否使用aggregate
的公式方法.)
(Alternatively, make the upgrade to something like "data.table", which will not just be faster than aggregate
, but which will also give you more consistent behavior with NA
values. No need to pay heed to whether you're using the formula method of aggregate
or not.)
library(data.table)
as.data.table(d)[, sum(y), by = x]
# x V1
# 1: 1 2
# 2: NA 3
这篇关于使用na.action = na.pass进行聚合会给出意外的答案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!