泛化 data.frame 子集函数 [英] Generalizing a data.frame subsetting function

查看:25
本文介绍了泛化 data.frame 子集函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个玩具 data.frame,它有 4 列(study,outcome,group,time).假设,用户想知道在哪些唯一的 study 值中,任何其他选定的列值是恒定的或变化的.

I have a toy data.frame that has 4 columns (study,outcome,group,time). Say, a user wants to know in which unique study values any of the other selected column values are constant or vary.

例如,如果用户想知道哪些唯一的 study 值、outcomegroup 列值是恒定的或变化的,那么我们知道 4 种可能的情况:

For example, if user wants to know in which unique study values, outcome and group column values are constant or vary, then we know 4 situations are possible:

  1. group 是不变的,但 outcome 是变化的.
  2. outcome 不变,但 group 变化.
  3. 结果 &group 两者都不同.
  4. 结果 &group 两者都没有变化.
  1. group is constant but outcome varies.
  2. outcome is constant but group varies.
  3. outcome & group both vary.
  4. outcome & group both don't vary.

下面的函数foo,正是基于上面的例子.

Function foo below, is exactly based on the above example.

问题:我想知道如何概括 foo 以便用户可以输入他选择的列的名称(例如,outcomegroup) 在函数中,并且 foo 会自动检查所选列中哪些唯一的 study 值是恒定的还是变化的?

Question: I wonder how to generalize foo such that user can input the names of the his selected columns (e.g., outcome and group) in the function, and foo automatically examines in which unique study values any of the selected columns are constant or vary?

ps.在下面的示例中,我的通用函数将产生如下所示的相同输出.

h = "
study outcome group time
a     1       1     0
a     2       1     1
b     1       1     0
b     1       2     0
c     2       1     0
c     3       2     1
d     1       1     0
d     1       1     0
e     1       1     0"
h = read.table(text=h,h=T)

foo <- function(dat, cond) {
  
  switch(cond, 
         
         `1` = dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) == 1, n_distinct(outcome) > 1) %>%
           ungroup,
         `2` = dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) > 1, n_distinct(outcome) == 1) %>%
           ungroup,
         
         `3` =  dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) > 1, n_distinct(outcome) > 1) %>%
           ungroup,
         `4` = dat %>% 
           group_by(study) %>%
           filter(n_distinct(group) == 1, n_distinct(outcome) == 1) %>%
           ungroup )  } 

#------------------- EXAMPLE OF USE:
> foo(h, 1)
# A tibble: 2 x 3
  study outcome group
  <chr>   <int> <int>
1 a           1     1
2 a           2     1
> foo(h, 2)
# A tibble: 2 x 3
  study outcome group
  <chr>   <int> <int>
1 b           1     1
2 b           1     2
> foo(h, 3)
# A tibble: 2 x 3
  study outcome group
  <chr>   <int> <int>
1 c           2     1
2 c           3     2
> foo(h, 4)
# A tibble: 3 x 3
  study outcome group
  <chr>   <int> <int>
1 d           1     1
2 d           1     1
3 e           1     1

推荐答案

如果输入参数不带引号,使用 {{}}

If the input argument is unquoted, use {{}}

foo <- function(dat, study_col, group_col, outcome_col) {
  
  fn1 <- function(cond) {
           switch(cond, 
         
         `1` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) == 1, n_distinct({{outcome_col}}) > 1) %>%
           ungroup,
         `2` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) > 1, n_distinct({{outcome_col}}) == 1) %>%
      ungroup,
         `3` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) > 1, n_distinct({{outcome_col}}) > 1) %>%
      ungroup,
         
         `4` = dat %>% 
           group_by({{study_col}}) %>%
           filter(n_distinct({{group_col}}) == 1, n_distinct({{outcome_col}}) == 1) %>%
      ungroup
         )  }
     purrr::map(1:4, ~ fn1(.x))

}

-测试

> foo(h, study, group, outcome)
[[1]]
# A tibble: 2 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 a           1     1     0
2 a           2     1     1

[[2]]
# A tibble: 2 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 b           1     1     0
2 b           1     2     0

[[3]]
# A tibble: 2 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 c           2     1     0
2 c           3     2     1

[[4]]
# A tibble: 3 x 4
  study outcome group  time
  <chr>   <int> <int> <int>
1 d           1     1     0
2 d           1     1     0
3 e           1     1     0


或者使用


Or use

foo2 <- function(dat, study_col, group_col, outcome_col) {

    dat %>%
           dplyr::select({{study_col}}, {{group_col}}, {{outcome_col}}) %>%
           dplyr::group_by({{study_col}}) %>%
          dplyr::mutate(grp = stringr::str_c(n_distinct({{group_col}}) == 1, 
              n_distinct({{outcome_col}}) == 1 ))   %>%
           dplyr::ungroup(.) %>%
           dplyr::group_split(grp, .keep = FALSE)  



}

-测试

> foo2(h, study, group, outcome)
<list_of<
  tbl_df<
    study  : character
    group  : integer
    outcome: integer
  >
>[4]>
[[1]]
# A tibble: 2 x 3
  study group outcome
  <chr> <int>   <int>
1 c         1       2
2 c         2       3

[[2]]
# A tibble: 2 x 3
  study group outcome
  <chr> <int>   <int>
1 b         1       1
2 b         2       1

[[3]]
# A tibble: 2 x 3
  study group outcome
  <chr> <int>   <int>
1 a         1       1
2 a         1       2

[[4]]
# A tibble: 3 x 3
  study group outcome
  <chr> <int>   <int>
1 d         1       1
2 d         1       1
3 e         1       1

这篇关于泛化 data.frame 子集函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆