如何在不丢失NA行的情况下对R中的数据进行子集化? [英] How to subset data in R without losing NA rows?

查看:71
本文介绍了如何在不丢失NA行的情况下对R中的数据进行子集化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些要在R中查看的数据.一个名为"Height"的特定列包含几行NA.

I have some data that I am looking at in R. One particular column, titled "Height", contains a few rows of NA.

我正在寻找数据框的子集,以便将高于特定值的所有高度从我的分析中排除.

I am looking to subset my data-frame so that all Heights above a certain value are excluded from my analysis.

df2 <- subset ( df1 , Height < 40 )

但是,无论何时我这样做,R都会自动删除所有包含NA的NA值的行.我不想这样.我尝试过为na.rm添加参数

However whenever I do this, R automatically removes all rows that contain NA values for Height. I do not want this. I have tried including arguments for na.rm

f1 <- function ( x , na.rm = FALSE ) {
df2 <- subset ( x , Height < 40 )
}
f1 ( df1 , na.rm = FALSE )

但是这似乎无能为力;带有NA的行仍然最终从我的数据框中消失.有没有办法这样子化我的数据而不丢失NA行?

but this does not seem to do anything; the rows with NA still end up disappearing from my data-frame. Is there a way of subsetting my data as such, without losing the NA rows?

推荐答案

如果我们决定使用subset函数,则需要当心:

If we decide to use subset function, then we need to watch out:

For ordinary vectors, the result is simply ‘x[subset & !is.na(subset)]’.

因此,将仅保留非NA值.

So only non-NA values will be retained.

如果要保留NA个案例,请使用逻辑或条件告诉R不要放弃NA个案例:

If you want to keep NA cases, use logical or condition to tell R not to drop NA cases:

subset(df1, Height < 40 | is.na(Height))
# or `df1[df1$Height < 40 | is.na(df1$Height), ]`

请勿直接使用(稍后说明):

Don't use directly (to be explained soon):

df2 <- df1[df1$Height < 40, ]

示例

df1 <- data.frame(Height = c(NA, 2, 4, NA, 50, 60), y = 1:6)

subset(df1, Height < 40 | is.na(Height))

#  Height y
#1     NA 1
#2      2 2
#3      4 3
#4     NA 4

df1[df1$Height < 40, ]

#  Height  y
#1     NA NA
#2      2  2
#3      4  3
#4     NA NA

后者失败的原因是NA的索引给出了NA.考虑一个带有向量的简单示例:

The reason that the latter fails, is that indexing by NA gives NA. Consider this simple example with a vector:

x <- 1:4
ind <- c(NA, TRUE, NA, FALSE)
x[ind]
# [1] NA  2 NA

我们需要以某种方式用TRUE替换那些NA.最直接的方法是添加另一个或"条件is.na(ind):

We need to somehow replace those NA with TRUE. The most straightforward way is to add another "or" condition is.na(ind):

x[ind | is.na(ind)]
# [1] 1 2 3

这正是您遇到的情况.如果您的Height包含NA,则逻辑运算Height < 40最终将TRUE/FALSE/NA混合在一起,因此我们需要如上所述用TRUE替换NA.

This is exactly what will happen in your situation. If your Height contains NA, then logical operation Height < 40 ends up a mix of TRUE / FALSE / NA, so we need replace NA by TRUE as above.

这篇关于如何在不丢失NA行的情况下对R中的数据进行子集化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆