尝试汇总数据子集(R)时不适用 [英] NA when trying to summarize a subset of data (R)

查看:96
本文介绍了尝试汇总数据子集(R)时不适用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

整个矢量都可以,并且没有NAs:

Whole vector is ok and has no NAs:

> summary(data$marks)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    6.00    6.00    6.02    7.00    7.00

> length(data$marks)
[1] 2528

但是,当尝试使用标准来计算子集时,我收到很多NAs:

However, when trying to calculate a subset using a criteria I receive lots of NAs:

> summary(data[data$student=="John",]$marks)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  1.000   6.000   6.000   6.169   7.000   7.000     464

> length(data[data$student=="John",]$marks)
[1] 523

推荐答案

我认为问题是您缺少student的值.结果,当您按student进行子集运算时,获取子集时,所有学生的NA值最终都会为marks生成NA.将子设置条件包装在which()中,以避免出现此问题.以下是一些示例,希望可以阐明正在发生的事情:

I think the problem is that you have missing values for student. As a result, when you subset by student, all the NA values for student end up producing NA for marks when you take your subset. Wrap the subsetting condition in which() to avoid this problem. Here are a few examples that will hopefully clarify what's happening:

# Fake data
set.seed(103)
dat = data.frame(group=rep(LETTERS[1:3], each=3), 
                 value=rnorm(9))
dat$group[1] = NA

dat$value
dat[dat$group=="B", "value"]
dat[which(dat$group=="B"), "value"]

# Simpler example
x = c(10,20,30,40, NA)

x>20
x[x>20]

which(x>20)
x[which(x>20)]

这篇关于尝试汇总数据子集(R)时不适用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆