如何从包含n * NA的数据框中删除行 [英] How to delete rows from a dataframe that contain nNA*

查看：235 发布时间：2017/11/8 19:36:03 r filter merge rows na

本文介绍了如何从包含n * NA的数据框中删除行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些大数据集，大约有10列，大约200000行。不是所有的列都包含每行的值，尽管至少有一列必须包含一个行的值，我想设置一个阈值多少 NA s

我的数据框看起来像这样：

  ID qrstuvwxyz 
 A 1 5 NA 3 8 9 NA 8 6 4 
 B 5 NA 4 6 1 9 7 4 9 3 
 C NA 9 4 NA 4 8 4 NA 5 NA 
 D 2 2 6 8 4 NA 3 7 1 32

我希望能够删除包含2个以上包含NA的单元格的行得到

  ID qrstuvwxyz 
 A 1 5 NA 3 8 9不适用8 6 4 
 B 5不适用4 6 1 9 7 4 9 3 
 D 2 2 6 8 4不适用3 7 1 32

$ b $ complete.cases 删除包含任何 NA 的所有行，我知道可以删除某些列中包含 NA 的行，但是有办法m对它进行修饰，以便对哪些列包含 NA 没有具体说明，但总数有多少？

另外，这个数据框是通过使用

  file1< -read.delim（〜/ file1.txt ）
 file2< -read.delim（file = args [1]）$ b 
 $ b file1< -merge（file1，file2，by =chr.pos，all = TRUE）

也许合并函数可以被修改？

感谢

解决方案使用 rowSums 。要从数据框（ df ）中删除精确包含 n NA 值的行：

df < - df [rowSums（is.na（df））！= n，] code>
或者删除包含 n 或更多 NA
df < - df [rowSums（is.na（df））< n，]
在两种情况下，都可以替换 n 所需的数字
I have a number of large datasets with ~10 columns, and ~200000 rows. Not all columns contain values for each row, although at least one column must contain a value for the row to be present, I would like to set a threshold for how many NAs are allowed in a row.
My Dataframe looks something like this: ID q r s t u v w x y z A 1 5 NA 3 8 9 NA 8 6 4 B 5 NA 4 6 1 9 7 4 9 3 C NA 9 4 NA 4 8 4 NA 5 NA D 2 2 6 8 4 NA 3 7 1 32 And I would like to be able to delete the rows that contain more than 2 cells containing NA to get ID q r s t u v w x y z A 1 5 NA 3 8 9 NA 8 6 4 B 5 NA 4 6 1 9 7 4 9 3 D 2 2 6 8 4 NA 3 7 1 32 complete.cases removes all rows containing any NA, and I know one can delete rows that contain NA in certain columns but is there a way to modify it so that it is non-specific about which columns contain NA, but how many of the total do? Alternatively, this dataframe is generated by merging several dataframes using file1<-read.delim("~/file1.txt") file2<-read.delim(file=args[1]) file1<-merge(file1,file2,by="chr.pos",all=TRUE) Perhaps the merge function could be altered? Thanks 解决方案 Use rowSums. To remove rows from a data frame (df) that contain precisely n NA values: df <- df[rowSums(is.na(df)) != n, ] or to remove rows that contain n or more NA values: df <- df[rowSums(is.na(df)) < n, ] in both cases of course replacing n with the number that's required 这篇关于如何从包含n * NA的数据框中删除行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从包含n * NA的数据框中删除行 [英] How to delete rows from a dataframe that contain nNA*

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从包含n * NA的数据框中删除行 [英] How to delete rows from a dataframe that contain n*NA

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何从包含n * NA的数据框中删除行 [英] How to delete rows from a dataframe that contain nNA*

登录关闭