泛化 data.frame 子集函数 [英] Generalizing a data.frame subsetting function
问题描述
我有一个玩具 data.frame,它有 4 列(study
,outcome
,group
,time
).假设,用户想知道在哪些唯一的 study
值中,任何其他选定的列值是恒定的或变化的.
I have a toy data.frame that has 4 columns (study
,outcome
,group
,time
). Say, a user wants to know in which unique study
values any of the other selected column values are constant or vary.
例如,如果用户想知道哪些唯一的 study
值、outcome
和 group
列值是恒定的或变化的,那么我们知道 4 种可能的情况:
For example, if user wants to know in which unique study
values, outcome
and group
column values are constant or vary, then we know 4 situations are possible:
group
是不变的,但outcome
是变化的.outcome
不变,但group
变化.结果
&group
两者都不同.结果
&group
两者都没有变化.
group
is constant butoutcome
varies.outcome
is constant butgroup
varies.outcome
&group
both vary.outcome
&group
both don't vary.
下面的函数foo
,正是基于上面的例子.
Function foo
below, is exactly based on the above example.
问题:我想知道如何概括 foo
以便用户可以输入他选择的列的名称(例如,outcome
和 group
) 在函数中,并且 foo
会自动检查所选列中哪些唯一的 study
值是恒定的还是变化的?
Question: I wonder how to generalize foo
such that user can input the names of the his selected columns (e.g., outcome
and group
) in the function, and foo
automatically examines in which unique study
values any of the selected columns are constant or vary?
ps.在下面的示例中,我的通用函数将产生如下所示的相同输出.
h = "
study outcome group time
a 1 1 0
a 2 1 1
b 1 1 0
b 1 2 0
c 2 1 0
c 3 2 1
d 1 1 0
d 1 1 0
e 1 1 0"
h = read.table(text=h,h=T)
foo <- function(dat, cond) {
switch(cond,
`1` = dat %>%
group_by(study) %>%
filter(n_distinct(group) == 1, n_distinct(outcome) > 1) %>%
ungroup,
`2` = dat %>%
group_by(study) %>%
filter(n_distinct(group) > 1, n_distinct(outcome) == 1) %>%
ungroup,
`3` = dat %>%
group_by(study) %>%
filter(n_distinct(group) > 1, n_distinct(outcome) > 1) %>%
ungroup,
`4` = dat %>%
group_by(study) %>%
filter(n_distinct(group) == 1, n_distinct(outcome) == 1) %>%
ungroup ) }
#------------------- EXAMPLE OF USE:
> foo(h, 1)
# A tibble: 2 x 3
study outcome group
<chr> <int> <int>
1 a 1 1
2 a 2 1
> foo(h, 2)
# A tibble: 2 x 3
study outcome group
<chr> <int> <int>
1 b 1 1
2 b 1 2
> foo(h, 3)
# A tibble: 2 x 3
study outcome group
<chr> <int> <int>
1 c 2 1
2 c 3 2
> foo(h, 4)
# A tibble: 3 x 3
study outcome group
<chr> <int> <int>
1 d 1 1
2 d 1 1
3 e 1 1
推荐答案
如果输入参数不带引号,使用 {{}}
If the input argument is unquoted, use {{}}
foo <- function(dat, study_col, group_col, outcome_col) {
fn1 <- function(cond) {
switch(cond,
`1` = dat %>%
group_by({{study_col}}) %>%
filter(n_distinct({{group_col}}) == 1, n_distinct({{outcome_col}}) > 1) %>%
ungroup,
`2` = dat %>%
group_by({{study_col}}) %>%
filter(n_distinct({{group_col}}) > 1, n_distinct({{outcome_col}}) == 1) %>%
ungroup,
`3` = dat %>%
group_by({{study_col}}) %>%
filter(n_distinct({{group_col}}) > 1, n_distinct({{outcome_col}}) > 1) %>%
ungroup,
`4` = dat %>%
group_by({{study_col}}) %>%
filter(n_distinct({{group_col}}) == 1, n_distinct({{outcome_col}}) == 1) %>%
ungroup
) }
purrr::map(1:4, ~ fn1(.x))
}
-测试
> foo(h, study, group, outcome)
[[1]]
# A tibble: 2 x 4
study outcome group time
<chr> <int> <int> <int>
1 a 1 1 0
2 a 2 1 1
[[2]]
# A tibble: 2 x 4
study outcome group time
<chr> <int> <int> <int>
1 b 1 1 0
2 b 1 2 0
[[3]]
# A tibble: 2 x 4
study outcome group time
<chr> <int> <int> <int>
1 c 2 1 0
2 c 3 2 1
[[4]]
# A tibble: 3 x 4
study outcome group time
<chr> <int> <int> <int>
1 d 1 1 0
2 d 1 1 0
3 e 1 1 0
或者使用
Or use
foo2 <- function(dat, study_col, group_col, outcome_col) {
dat %>%
dplyr::select({{study_col}}, {{group_col}}, {{outcome_col}}) %>%
dplyr::group_by({{study_col}}) %>%
dplyr::mutate(grp = stringr::str_c(n_distinct({{group_col}}) == 1,
n_distinct({{outcome_col}}) == 1 )) %>%
dplyr::ungroup(.) %>%
dplyr::group_split(grp, .keep = FALSE)
}
-测试
> foo2(h, study, group, outcome)
<list_of<
tbl_df<
study : character
group : integer
outcome: integer
>
>[4]>
[[1]]
# A tibble: 2 x 3
study group outcome
<chr> <int> <int>
1 c 1 2
2 c 2 3
[[2]]
# A tibble: 2 x 3
study group outcome
<chr> <int> <int>
1 b 1 1
2 b 2 1
[[3]]
# A tibble: 2 x 3
study group outcome
<chr> <int> <int>
1 a 1 1
2 a 1 2
[[4]]
# A tibble: 3 x 3
study group outcome
<chr> <int> <int>
1 d 1 1
2 d 1 1
3 e 1 1
这篇关于泛化 data.frame 子集函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!