尝试汇总数据子集(R)时不适用 [英] NA when trying to summarize a subset of data (R)
本文介绍了尝试汇总数据子集(R)时不适用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
整个矢量都可以,并且没有NAs
:
Whole vector is ok and has no NAs
:
> summary(data$marks)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 6.00 6.00 6.02 7.00 7.00
> length(data$marks)
[1] 2528
但是,当尝试使用标准来计算子集时,我收到很多NAs
:
However, when trying to calculate a subset using a criteria I receive lots of NAs
:
> summary(data[data$student=="John",]$marks)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 6.000 6.000 6.169 7.000 7.000 464
> length(data[data$student=="John",]$marks)
[1] 523
推荐答案
我认为问题是您缺少student
的值.结果,当您按student
进行子集运算时,获取子集时,所有学生的NA
值最终都会为marks
生成NA
.将子设置条件包装在which()
中,以避免出现此问题.以下是一些示例,希望可以阐明正在发生的事情:
I think the problem is that you have missing values for student
. As a result, when you subset by student
, all the NA
values for student end up producing NA
for marks
when you take your subset. Wrap the subsetting condition in which()
to avoid this problem. Here are a few examples that will hopefully clarify what's happening:
# Fake data
set.seed(103)
dat = data.frame(group=rep(LETTERS[1:3], each=3),
value=rnorm(9))
dat$group[1] = NA
dat$value
dat[dat$group=="B", "value"]
dat[which(dat$group=="B"), "value"]
# Simpler example
x = c(10,20,30,40, NA)
x>20
x[x>20]
which(x>20)
x[which(x>20)]
这篇关于尝试汇总数据子集(R)时不适用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文