如何从包含n * NA的数据框中删除行 [英] How to delete rows from a dataframe that contain n*NA

查看:235
本文介绍了如何从包含n * NA的数据框中删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些大数据集,大约有10列,大约200000行。不是所有的列都包含每行的值,尽管至少有一列必须包含一个行的值,我想设置一个阈值多少 NA s

我的数据框看起来像这样:

  ID qrstuvwxyz 
A 1 5 NA 3 8 9 NA 8 6 4
B 5 NA 4 6 1 9 7 4 9 3
C NA 9 4 NA 4 8 4 NA 5 NA
D 2 2 6 8 4 NA 3 7 1 32

我希望能够删除包含2个以上包含NA的单元格的行得到

  ID qrstuvwxyz 
A 1 5 NA 3 8 9不适用8 6 4
B 5不适用4 6 1 9 7 4 9 3
D 2 2 6 8 4不适用3 7 1 32
$ b $ complete.cases 删除包含任何 NA 的所有行,我知道可以删除某些列中包含 NA 的行,但是有办法m对它进行修饰,以便对哪些列包含 NA 没有具体说明,但总数有多少?



另外,这个数据框是通过使用

  file1< -read.delim(〜/ file1.txt )
file2< -read.delim(file = args [1])$ ​​b
$ b file1< -merge(file1,file2,by =chr.pos,all = TRUE)

也许合并函数可以被修改?

感谢

解决方案使用 rowSums 。要从数据框( df )中删除精确包含 n NA 值的行:

  df < -  df [rowSums(is.na(df))!= n,] 
code>

或者删除包含 n 或更多 NA

  df < -  df [rowSums(is.na(df))< n,] 

在两种情况下,都可以替换 n 所需的数字

I have a number of large datasets with ~10 columns, and ~200000 rows. Not all columns contain values for each row, although at least one column must contain a value for the row to be present, I would like to set a threshold for how many NAs are allowed in a row.

My Dataframe looks something like this:

 ID q  r  s  t  u  v  w  x  y  z
 A  1  5  NA 3  8  9  NA 8  6  4
 B  5  NA 4  6  1  9  7  4  9  3 
 C  NA 9  4  NA 4  8  4  NA 5  NA
 D  2  2  6  8  4  NA 3  7  1  32 

And I would like to be able to delete the rows that contain more than 2 cells containing NA to get

ID q  r  s  t  u  v  w  x  y  z
 A 1  5  NA 3  8  9  NA 8  6  4
 B 5  NA 4  6  1  9  7  4  9  3 
 D 2  2  6  8  4  NA 3  7  1  32 

complete.cases removes all rows containing any NA, and I know one can delete rows that contain NA in certain columns but is there a way to modify it so that it is non-specific about which columns contain NA, but how many of the total do?

Alternatively, this dataframe is generated by merging several dataframes using

    file1<-read.delim("~/file1.txt")
    file2<-read.delim(file=args[1])

    file1<-merge(file1,file2,by="chr.pos",all=TRUE)

Perhaps the merge function could be altered?

Thanks

解决方案

Use rowSums. To remove rows from a data frame (df) that contain precisely n NA values:

df <- df[rowSums(is.na(df)) != n, ]

or to remove rows that contain n or more NA values:

df <- df[rowSums(is.na(df)) < n, ]

in both cases of course replacing n with the number that's required

这篇关于如何从包含n * NA的数据框中删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆