后续:泛化 data.frame 子集功能 [英] Follow-up: Generalizing a data.frame subsetting function

查看：26 发布时间：2021/9/7 19:36:27 r dataframe function dplyr tidyverse

本文介绍了后续:泛化 data.frame 子集功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在跟进这个很棒的答案.在那个答案中，foo 函数帮助用户知道在哪个独特的研究中 值 另外两个选定的列(group&outcome)是 constant 或变化.在那种情况下，有 4 种情况 (2^2) 是可能的:

I'm following up on this great answer. In that answer, the foo function helped the user to know in which unique study values any of the two other selected columns (group&outcome) are constant or vary. In that case, 4 situations (2^2) were possible:

group 是不变的，但 outcome 是变化的.
outcome 不变，但 group 变化.
结果 &group 两者都不同.
结果 &group 两者都没有变化.

group is constant but outcome varies.
outcome is constant but group varies.
outcome & group both vary.
outcome & group both don't vary.

下面的函数 foo2 用于为 study 提取满足上述 4 种可能性的行.

Function foo2 below was offered to extract rows for study that met each of the above 4 possibilities.

在这个后续中，我想知道我们是否可以通过添加一个 ... 参数来扩展 foo2 函数，以便用户可以输入一个或多个 var_col(例如，var1_col、var2_col、...)，然后函数会提取可能的情况(变量数^2情况)?

In this follow-up, I was wondering if we can extend foo2 function by perhaps adding a ... argument so user can input one or more var_col (e.g., var1_col, var2_col,...), and then function will extract the possible situations (number of variable^2 situations)?

例如，在我下面的数据中，我有 4 个选定的列(sample,group,outcome,control)，因此存在 8 种可能的情况，可以通过新函数从该数据中提取.

For example, in my below data, I have 4 selected columns (sample,group,outcome,control) and thus 8 possible situations exist to be extracted from this data by a new function.

我知道如果用户添加多个变量，变量^2的数量会迅速增加，所以我想将foo2函数扩展到更多的变量.

I understand number of variable^2 quickly increases if a user adds multiple variables, so I want to extend foo2 function to just a few more variables.

h = "
   case sample group outcome control
1     1      1     1       1       1
2     1      2     1       1       1
3     1      1     2       1       1
4     1      2     2       1       1
5     1      1     1       2       1
6     1      2     1       2       1
7     1      1     2       2       1
8     1      2     2       2       1
9     2      1     1       1       1
10    2      1     2       1       1
11    2      1     1       2       1
12    2      1     2       2       1
13    3      1     1       1       1
14    3      2     1       1       1
15    3      1     1       2       1
16    3      2     1       2       1
17    4      1     1       1       1
18    4      2     1       1       1
19    4      1     2       1       1
20    4      2     2       1       1
21    5      1     1       1       1
22    5      1     2       1       1
23    6      1     1       1       1
24    6      1     2       1       1
25    7      1     1       1       1
26    7      2     1       1       1
27    8      1     1       1       1
28    9      1     1       1       1
29    9      2     1       1       1
30    9      1     2       1       1
31    9      2     2       1       1
32    9      1     1       2       1
33    9      2     1       2       1
34    9      1     2       2       1
35    9      2     2       2       1
36    9      1     1       1       2
37    9      2     1       1       2
38    9      1     2       1       2
39    9      2     2       1       2
40    9      1     1       2       2
41    9      2     1       2       2
42    9      1     2       2       2
43    9      2     2       2       2
44   10      1     1       1       1
45   10      1     2       1       1
46   10      1     1       2       1
47   10      1     2       2       1
48   10      1     1       1       2
49   10      1     2       1       2
50   10      1     1       2       2
51   10      1     2       2       2
52   11      1     1       1       1
53   11      2     1       1       1
54   11      1     1       2       1
55   11      2     1       2       1
56   11      1     1       1       2
57   11      2     1       1       2
58   11      1     1       2       2
59   11      2     1       2       2
60   12      1     1       1       1
61   12      2     1       1       1
62   12      1     2       1       1
63   12      2     2       1       1
64   12      1     1       1       2
65   12      2     1       1       2
66   12      1     2       1       2
67   12      2     2       1       2
68   13      1     1       1       1
69   13      1     2       1       1
70   13      1     1       1       2
71   13      1     2       1       2
72   14      1     1       1       1
73   14      1     2       1       1
74   14      1     1       1       2
75   14      1     2       1       2
76   15      1     1       1       1
77   15      2     1       1       1
78   15      1     1       1       2
79   15      2     1       1       2
80   16      1     1       1       1
81   16      1     1       1       2"
h = read.table(text=h,h=T)

foo2 <- function(dat, study_col, group_col, outcome_col) {

    dat %>%
           dplyr::select({{study_col}}, {{group_col}}, {{outcome_col}}) %>%
           dplyr::group_by({{study_col}}) %>%
          dplyr::mutate(grp = stringr::str_c(n_distinct({{group_col}}) == 1, 
              n_distinct({{outcome_col}}) == 1 ))   %>%
           dplyr::ungroup(.) %>%
           dplyr::group_split(grp, .keep = FALSE) }

推荐答案

如果我们想在末尾添加 3 个点，将其捕获为 ensyms 中的符号，然后将其转换为字符串(as_string)，通过循环across 那些列来进行唯一的更改，使用 n_distinct 创建一个逻辑条件并粘贴 (str_c)使用 reduce，同时粘贴其他列(也可以在单个 across 中简化)，然后在最后的 ' 上使用 group_splitgrp'列

If we want to add 3 dots at the end, capture it as symbols in ensyms, then convert it to string (as_string), make the only change by looping across those columns, create a logical condition with n_distinct and paste (str_c) with reduce, while pasteing the other columns as well (could be simplified in a single across as well) and then use group_split on the final 'grp' column

foo2 <- function(dat, study_col, group_col, outcome_col, ...) {
   
    dot_cols <- ensyms(...)
    str_cols <- purrr::map_chr(dot_cols, rlang::as_string)

    dat %>%
           dplyr::select({{study_col}}, {{group_col}}, {{outcome_col}}, !!! dot_cols) %>%
            dplyr::group_by({{study_col}}) %>%
            dplyr::mutate(grp = across(all_of(str_cols), ~ n_distinct(.) == 1) %>%
                   purrr::reduce(stringr::str_c, collapse=""),
                          grp =  stringr::str_c(n_distinct({{group_col}}) == 1, 
                n_distinct({{outcome_col}}) == 1, grp)) %>%
                dplyr::ungroup(.) %>%
                 dplyr::group_split(grp, .keep = FALSE)
           
            }

-测试

> out <- foo2(h, case, group, outcome, sample, control)
> out
<list_of<
  tbl_df<
    case   : integer
    group  : integer
    outcome: integer
    sample : integer
    control: integer
  >
>[14]>
[[1]]
# A tibble: 16 x 5
    case group outcome sample control
   <int> <int>   <int>  <int>   <int>
 1     9     1       1      1       1
 2     9     1       1      2       1
 3     9     2       1      1       1
 4     9     2       1      2       1
 5     9     1       2      1       1
 6     9     1       2      2       1
 7     9     2       2      1       1
 8     9     2       2      2       1
 9     9     1       1      1       2
10     9     1       1      2       2
11     9     2       1      1       2
12     9     2       1      2       2
13     9     1       2      1       2
14     9     1       2      2       2
15     9     2       2      1       2
16     9     2       2      2       2

[[2]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     1     1       1      1       1
2     1     1       1      2       1
3     1     2       1      1       1
4     1     2       1      2       1
5     1     1       2      1       1
6     1     1       2      2       1
7     1     2       2      1       1
8     1     2       2      2       1

[[3]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    10     1       1      1       1
2    10     2       1      1       1
3    10     1       2      1       1
4    10     2       2      1       1
5    10     1       1      1       2
6    10     2       1      1       2
7    10     1       2      1       2
8    10     2       2      1       2

[[4]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     2     1       1      1       1
2     2     2       1      1       1
3     2     1       2      1       1
4     2     2       2      1       1

[[5]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    12     1       1      1       1
2    12     1       1      2       1
3    12     2       1      1       1
4    12     2       1      2       1
5    12     1       1      1       2
6    12     1       1      2       2
7    12     2       1      1       2
8    12     2       1      2       2

[[6]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     4     1       1      1       1
2     4     1       1      2       1
3     4     2       1      1       1
4     4     2       1      2       1

[[7]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    13     1       1      1       1
2    13     2       1      1       1
3    13     1       1      1       2
4    13     2       1      1       2
5    14     1       1      1       1
6    14     2       1      1       1
7    14     1       1      1       2
8    14     2       1      1       2

[[8]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     5     1       1      1       1
2     5     2       1      1       1
3     6     1       1      1       1
4     6     2       1      1       1

[[9]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    11     1       1      1       1
2    11     1       1      2       1
3    11     1       2      1       1
4    11     1       2      2       1
5    11     1       1      1       2
6    11     1       1      2       2
7    11     1       2      1       2
8    11     1       2      2       2

[[10]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     3     1       1      1       1
2     3     1       1      2       1
3     3     1       2      1       1
4     3     1       2      2       1

[[11]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    15     1       1      1       1
2    15     1       1      2       1
3    15     1       1      1       2
4    15     1       1      2       2

[[12]]
# A tibble: 2 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     7     1       1      1       1
2     7     1       1      2       1

[[13]]
# A tibble: 2 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    16     1       1      1       1
2    16     1       1      1       2

[[14]]
# A tibble: 1 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     8     1       1      1       1

测试总行数

> sum(map_dbl(out, nrow))
[1] 81
> nrow(h)
[1] 81

如果我们想删除一些参数

If we want to remove some arguments

foo2 <- function(dat, study_col, ...) {
   
    dot_cols <- ensyms(...)
    str_cols <- purrr::map_chr(dot_cols, rlang::as_string)

    dat %>%
           dplyr::select({{study_col}}, !!! dot_cols) %>%
            dplyr::group_by({{study_col}}) %>%
            dplyr::mutate(grp = across(all_of(str_cols), ~ n_distinct(.) == 1) %>%
                   purrr::reduce(stringr::str_c, collapse="")) %>%
                dplyr::ungroup(.) %>%
                 dplyr::group_split(grp, .keep = FALSE)
           
            }

-测试

>  foo2(h, case, group, outcome, sample, control)
<list_of<
  tbl_df<
    case   : integer
    group  : integer
    outcome: integer
    sample : integer
    control: integer
  >
>[14]>
[[1]]
# A tibble: 16 x 5
    case group outcome sample control
   <int> <int>   <int>  <int>   <int>
 1     9     1       1      1       1
 2     9     1       1      2       1
 3     9     2       1      1       1
 4     9     2       1      2       1
 5     9     1       2      1       1
 6     9     1       2      2       1
 7     9     2       2      1       1
 8     9     2       2      2       1
 9     9     1       1      1       2
10     9     1       1      2       2
11     9     2       1      1       2
12     9     2       1      2       2
13     9     1       2      1       2
14     9     1       2      2       2
15     9     2       2      1       2
16     9     2       2      2       2

[[2]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     1     1       1      1       1
2     1     1       1      2       1
3     1     2       1      1       1
4     1     2       1      2       1
5     1     1       2      1       1
6     1     1       2      2       1
7     1     2       2      1       1
8     1     2       2      2       1

[[3]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    10     1       1      1       1
2    10     2       1      1       1
3    10     1       2      1       1
4    10     2       2      1       1
5    10     1       1      1       2
6    10     2       1      1       2
7    10     1       2      1       2
8    10     2       2      1       2

[[4]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     2     1       1      1       1
2     2     2       1      1       1
3     2     1       2      1       1
4     2     2       2      1       1

[[5]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    12     1       1      1       1
2    12     1       1      2       1
3    12     2       1      1       1
4    12     2       1      2       1
5    12     1       1      1       2
6    12     1       1      2       2
7    12     2       1      1       2
8    12     2       1      2       2

[[6]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     4     1       1      1       1
2     4     1       1      2       1
3     4     2       1      1       1
4     4     2       1      2       1

[[7]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    13     1       1      1       1
2    13     2       1      1       1
3    13     1       1      1       2
4    13     2       1      1       2
5    14     1       1      1       1
6    14     2       1      1       1
7    14     1       1      1       2
8    14     2       1      1       2

[[8]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     5     1       1      1       1
2     5     2       1      1       1
3     6     1       1      1       1
4     6     2       1      1       1

[[9]]
# A tibble: 8 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    11     1       1      1       1
2    11     1       1      2       1
3    11     1       2      1       1
4    11     1       2      2       1
5    11     1       1      1       2
6    11     1       1      2       2
7    11     1       2      1       2
8    11     1       2      2       2

[[10]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     3     1       1      1       1
2     3     1       1      2       1
3     3     1       2      1       1
4     3     1       2      2       1

[[11]]
# A tibble: 4 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    15     1       1      1       1
2    15     1       1      2       1
3    15     1       1      1       2
4    15     1       1      2       2

[[12]]
# A tibble: 2 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     7     1       1      1       1
2     7     1       1      2       1

[[13]]
# A tibble: 2 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1    16     1       1      1       1
2    16     1       1      1       2

[[14]]
# A tibble: 1 x 5
   case group outcome sample control
  <int> <int>   <int>  <int>   <int>
1     8     1       1      1       1

这篇关于后续:泛化 data.frame 子集功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

后续:泛化 data.frame 子集功能 [英] Follow-up: Generalizing a data.frame subsetting function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

后续:泛化 data.frame 子集功能 [英] Follow-up: Generalizing a data.frame subsetting function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭