排除整齐数据集中具有NA的组 [英] Exclude groups with NAs in tidy dataset

查看:81
本文介绍了排除整齐数据集中具有NA的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个整洁的 tibble ,其中一个值列由4个ID列标识。

I have a tidy tibble with a value column identified by 4 ID columns.

 > MWA
# A tibble: 16 x 5
# Groups:   Dir [2]
      VP   Con   Dir   Seg time_seg
   <int> <int> <int> <int>    <int>
 1    10     2     1     1     1810
 2    10     2     1     2      260
 3    10     2     1     3      540
 4    10     2     1     4     1470
 5    10     2     1     5      460
 6    10     2     1     6      690
 7    10     2     1     7      760
 8    10     2     1     8       NA
 9    10     2     2     1      320
10    10     2     2     2     1110
11    10     2     2     3      450
12    10     2     2     4      600
13    10     2     2     5     1680
14    10     2     2     6      730
15    10     2     2     7      850
16    10     2     2     8      840

dput 要复制的是

> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L, 
    690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L, 
    840L)), row.names = c(NA, -16L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
    0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
    Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))

它们来自更大的数据集,在其中按<$ c $分组c> VP , Con ,最后是 Dir

They stem from a larger data set, where they have been grouped by VP, Con and finally Dir.

如您所见,在第10小标题行中有一个 NA

As you can see, in tibble row 10 there is a NA.

根据这种情况,我现在想排除整个 Dir 组(所以第1行到第8行)使用 dplyr 缺少该值。

I now want to exclude the whole Dir group (so rows 1 trough 8), based on this condition that this one value is missing using dplyr.

使用过滤器 is.na complete.cases 的c>仅删除 NA ,而不是完整的组(在此数据集中是一个案例。)。

Using the filter with is.na or complete.cases only removes the row with the NA, not the complete group (which is one "case" in this dataset).

推荐答案

使用 all()将评估整个组,因此您可以跳过 mutate 步骤。

Using all() will evaluate the entire group, so you can skip the mutate step.

MWA %>% 
  group_by(Dir) %>% 
  filter(all(!is.na(time_seg)))

# A tibble: 8 x 5
# Groups:   Dir [1]
     VP   Con   Dir   Seg time_seg
  <int> <int> <int> <int>    <int>
1    10     2     2     1      320
2    10     2     2     2     1110
3    10     2     2     3      450
4    10     2     2     4      600
5    10     2     2     5     1680
6    10     2     2     6      730
7    10     2     2     7      850
8    10     2     2     8      840

这篇关于排除整齐数据集中具有NA的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆