子集R数据帧导致神秘的NA行 [英] Subsetting R data frame results in mysterious NA rows

查看:63
本文介绍了子集R数据帧导致神秘的NA行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直遇到我认为是错误的错误.没什么大不了的,但是我很好奇是否有人看过.不幸的是,我的数据是机密的,因此我必须举一个例子,并且不会很有帮助.

I've been encountering what I think is a bug. It's not a big deal, but I'm curious if anyone else has seen this. Unfortunately, my data is confidential, so I have to make up an example, and it's not going to be very helpful.

对数据进行子集设置时,偶尔会得到原始数据框中没有的神秘NA行.甚至行名都是NA. EG:

When subsetting my data, I occassionally get mysterious NA rows that aren't in my original data frame. Even the rownames are NA. EG:

example <- data.frame("var1"=c("A", "B", "A"), "var2"=c("X", "Y", "Z"))
example

  var1 var2
1    A    X
2    B    Y
3    A    Z

然后我跑步:

example[example$var1=="A",]

  var1 var2
1    A    X
3    A    Z
NA<NA> <NA>

当然,上面的示例实际上并没有为您提供这个神秘的NA行;我在这里添加它是为了说明我的数据存在问题.

Of course, the example above does not actually give you this mysterious NA row; I am adding it here to illustrate the problem I'm having with my data.

也许这与我使用导入原始数据集有关read.xlsx程序包,然后在子设置之前执行从宽到长的整形.

Maybe it has to do with the fact that I'm importing my original data set using Google's read.xlsx package and then executing wide to long reshape before subsetting.

谢谢

推荐答案

将条件包装在which中:

df[which(df$number1 < df$number2), ]


工作方式:

它返回条件匹配的行号(条件为TRUE),并相应地将这些行的数据帧子集化.

It returns the row numbers where the condition matches (where the condition is TRUE) and subsets the data frame on those rows accordingly.

说:

which(df$number1 < df$number2)

返回行号12345.

这样,写:

df[which(df$number1 < df$number2), ]

与写作相同:

df[c(1, 2, 3, 4, 5), ]

或更简单的版本是:

df[1:5, ]

这篇关于子集R数据帧导致神秘的NA行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆