根据行中NA的数量有条件地删除行 [英] Conditional row removal based on number of NA's within the row

查看:63
本文介绍了根据行中NA的数量有条件地删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望根据以下两种条件从数据集中删除行:

I am looking to remove rows from my dataset based on two conditions as follows:

  1. 如果连续3个单元格为NA
  2. ,则删除行
  3. 如果四个或更多单元格是NA
  1. Remove row if 3 consecutive cells are NA or
  2. If four or more cells are NA

我的样本数据:

data <- rbind(c(1,1,2,3,4,2,3,2),
              c(NA,1, NA, 4,1,1,NA,2), 
              c(1,4,6,7,3,1,2,2), 
              c(NA,3, NA, 1,NA,2,NA,NA), 
              c(1,4, NA, NA,NA,4,3,2))

我对现有问题进行了研究,发现na.omitcomplete.cases可以使用NA删除行,但是由于有条件,我在进行进一步研究后发现现有问题中的以下代码:

I have researched within the existing questions and found that na.omit or complete.cases can remove rows with NA but as I have conditions, doing further research I have found the following code within the existing questions:

data[! rowSums(is.na(data)) >4  , ]   
data[! rowSums(is.na(data)) ==3  , ]

第一行完全填满了我的第二个条件.第二行确实删除了具有三个NA的行,但没有查找连续的行,也不删除了总计3个NA的任何行.例如:

The first line full fill my second condition. the second line does remove rows with three NA's but not looking for consecutive and removing any rows with total 3 NA's. for example:

> data
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    2    3    4    2    3    2
[2,]   NA    1   NA    4    1    1   NA    2
[3,]    1    4    6    7    3    1    2    2
[4,]   NA    3   NA    1   NA    2   NA   NA
[5,]    1    4   NA   NA   NA    4    3    2

> data[! rowSums(is.na(data)) ==3  , ]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    2    3    4    2    3    2
[2,]    1    4    6    7    3    1    2    2
[3,]   NA    3   NA    1   NA    2   NA   NA

我真正想要的是仅删除第5行,因为它具有三个连续的NA而不是第2行.

What I actually want is the 5th row to be removed only as this has three consecutive NA's and not the 2nd row.

有人可以建议我如何克服这个问题吗?

Could anyone please advice me how can I overcome this?

推荐答案

同时满足两个条件:

data[!apply(is.na(data), 1, function(x) 
  {v <- cumsum(x); any(diff(v, 3) == 3) | 4 %in% v}), ]
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,]    1    1    2    3    4    2    3    2
# [2,]   NA    1   NA    4    1    1   NA    2
# [3,]    1    4    6    7    3    1    2    2

any(diff(v, 3) == 3)TRUE,如果连续出现了3次(NA)(那么某处的差是3),并且4 %in% v对应于第二个条件.

any(diff(v, 3) == 3) is TRUE if there were NA three times in a row (so the difference somewhere is 3) and 4 %in% v corresponds to the second condition.

这篇关于根据行中NA的数量有条件地删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆