汇总方法对缺失值(NA)的处理方式不同 [英] aggregate methods treat missing values (NA) differently
问题描述
这是一个缺少值的简单数据框:
Here's a simple data frame with a missing value:
M = data.frame( Name = c('name', 'name'), Col1 = c(NA, 1) , Col2 = c(1, 1))
# Name Col1 Col2
# 1 name NA 1
# 2 name 1 1
当我使用formula
方法按组(名称")使用aggregate
至sum
变量时:
When I use aggregate
to sum
variables by group ('Name') using the formula
method:
aggregate(. ~ Name, M, FUN = sum, na.rm = TRUE)
结果是:
# RowName Col1 Col2
# name 1 1
因此具有NA
的整个第一行将被忽略.但是,如果使用"non- formula
",规格:
So the entire first row, which have an NA
, is ignored. But if use the "non-formula
" specification:
aggregate(M[, 2:3], by = list(M$Name), FUN = sum, na.rm = TRUE)
结果是:
# Group.1 Col1 Col2
# name 1 2
这里只有(1,1)项被忽略.
Here only the (1,1) entry is ignored.
这在我的一个代码中引起了严重的调试麻烦,因为我认为这两个调用是等效的. formula
进入方法被不同地对待是有充分的理由吗?
This caused a major debugging headache in one of my codes, since I thought these two calls were equivalent. Is there a good reason why the formula
entry method is treated differently?
谢谢.
推荐答案
很好的问题,但是我认为这不应该引起 major 调试的麻烦,因为在很多情况下它都记录得很清楚.在aggregate
的手册页中放置.
Good question, but in my opinion, this shouldn't have caused a major debugging headache because it is documented quite clearly in multiple places in the manual page for aggregate
.
首先,在用法"部分:
## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
subset, na.action = na.omit)
稍后,在说明中:
na.action
:该函数指示当数据包含NA值时应发生的情况.默认设置是忽略给定变量中的缺失值.
na.action
: a function which indicates what should happen when the data contain NA values. The default is to ignore missing values in the given variables.
我无法回答为什么公式模式的编写方式有所不同-这是函数作者必须回答的一些内容--但是使用以上信息,您可能可以使用以下内容:
I can't answer why the formula mode was written differently---that's something the function authors would have to answer---but using the above information, you can probably use the following:
aggregate(.~Name, M, FUN=sum, na.rm=TRUE, na.action=NULL)
# Name Col1 Col2
# 1 name 1 2
这篇关于汇总方法对缺失值(NA)的处理方式不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!