后续:泛化 data.frame 子集功能 [英] Follow-up: Generalizing a data.frame subsetting function
问题描述
我正在跟进这个很棒的答案.在那个答案中,foo
函数帮助用户知道在哪个独特的研究中
值 另外两个选定的列(group
&outcome
)是 constant 或 变化.在那种情况下,有 4 种情况 (2^2) 是可能的:
I'm following up on this great answer. In that answer, the foo
function helped the user to know in which unique study
values any of the two other selected columns (group
&outcome
) are constant or vary. In that case, 4 situations (2^2) were possible:
group
是不变的,但outcome
是变化的.outcome
不变,但group
变化.结果
&group
两者都不同.结果
&group
两者都没有变化.
group
is constant butoutcome
varies.outcome
is constant butgroup
varies.outcome
&group
both vary.outcome
&group
both don't vary.
下面的函数 foo2
用于为 study
提取满足上述 4 种可能性的行.
Function foo2
below was offered to extract rows for study
that met each of the above 4 possibilities.
在这个后续中,我想知道我们是否可以通过添加一个 ...
参数来扩展 foo2
函数,以便用户可以输入一个或多个 var_col
(例如,var1_col、var2_col、...),然后函数会提取可能的情况(变量数^2
情况)?
In this follow-up, I was wondering if we can extend foo2
function by perhaps adding a ...
argument so user can input one or more var_col
(e.g., var1_col, var2_col,...), and then function will extract the possible situations (number of variable^2
situations)?
例如,在我下面的数据中,我有 4
个选定的列(sample
,group
,outcome
,control
),因此存在 8
种可能的情况,可以通过新函数从该数据中提取.
For example, in my below data, I have 4
selected columns (sample
,group
,outcome
,control
) and thus 8
possible situations exist to be extracted from this data by a new function.
我知道如果用户添加多个变量,变量^2
的数量会迅速增加,所以我想将foo2
函数扩展到更多的变量.
I understand number of variable^2
quickly increases if a user adds multiple variables, so I want to extend foo2
function to just a few more variables.
h = "
case sample group outcome control
1 1 1 1 1 1
2 1 2 1 1 1
3 1 1 2 1 1
4 1 2 2 1 1
5 1 1 1 2 1
6 1 2 1 2 1
7 1 1 2 2 1
8 1 2 2 2 1
9 2 1 1 1 1
10 2 1 2 1 1
11 2 1 1 2 1
12 2 1 2 2 1
13 3 1 1 1 1
14 3 2 1 1 1
15 3 1 1 2 1
16 3 2 1 2 1
17 4 1 1 1 1
18 4 2 1 1 1
19 4 1 2 1 1
20 4 2 2 1 1
21 5 1 1 1 1
22 5 1 2 1 1
23 6 1 1 1 1
24 6 1 2 1 1
25 7 1 1 1 1
26 7 2 1 1 1
27 8 1 1 1 1
28 9 1 1 1 1
29 9 2 1 1 1
30 9 1 2 1 1
31 9 2 2 1 1
32 9 1 1 2 1
33 9 2 1 2 1
34 9 1 2 2 1
35 9 2 2 2 1
36 9 1 1 1 2
37 9 2 1 1 2
38 9 1 2 1 2
39 9 2 2 1 2
40 9 1 1 2 2
41 9 2 1 2 2
42 9 1 2 2 2
43 9 2 2 2 2
44 10 1 1 1 1
45 10 1 2 1 1
46 10 1 1 2 1
47 10 1 2 2 1
48 10 1 1 1 2
49 10 1 2 1 2
50 10 1 1 2 2
51 10 1 2 2 2
52 11 1 1 1 1
53 11 2 1 1 1
54 11 1 1 2 1
55 11 2 1 2 1
56 11 1 1 1 2
57 11 2 1 1 2
58 11 1 1 2 2
59 11 2 1 2 2
60 12 1 1 1 1
61 12 2 1 1 1
62 12 1 2 1 1
63 12 2 2 1 1
64 12 1 1 1 2
65 12 2 1 1 2
66 12 1 2 1 2
67 12 2 2 1 2
68 13 1 1 1 1
69 13 1 2 1 1
70 13 1 1 1 2
71 13 1 2 1 2
72 14 1 1 1 1
73 14 1 2 1 1
74 14 1 1 1 2
75 14 1 2 1 2
76 15 1 1 1 1
77 15 2 1 1 1
78 15 1 1 1 2
79 15 2 1 1 2
80 16 1 1 1 1
81 16 1 1 1 2"
h = read.table(text=h,h=T)
foo2 <- function(dat, study_col, group_col, outcome_col) {
dat %>%
dplyr::select({{study_col}}, {{group_col}}, {{outcome_col}}) %>%
dplyr::group_by({{study_col}}) %>%
dplyr::mutate(grp = stringr::str_c(n_distinct({{group_col}}) == 1,
n_distinct({{outcome_col}}) == 1 )) %>%
dplyr::ungroup(.) %>%
dplyr::group_split(grp, .keep = FALSE) }
推荐答案
如果我们想在末尾添加 3 个点,将其捕获为 ensyms
中的符号,然后将其转换为字符串(as_string
),通过循环across
那些列来进行唯一的更改,使用 n_distinct
创建一个逻辑条件并粘贴 (str_c
)使用 reduce
,同时粘贴其他列(也可以在单个 across
中简化),然后在最后的 ' 上使用 group_split
grp'列
If we want to add 3 dots at the end, capture it as symbols in ensyms
, then convert it to string (as_string
), make the only change by looping across
those columns, create a logical condition with n_distinct
and paste (str_c
) with reduce
, while pasteing the other columns as well (could be simplified in a single across
as well) and then use group_split
on the final 'grp' column
foo2 <- function(dat, study_col, group_col, outcome_col, ...) {
dot_cols <- ensyms(...)
str_cols <- purrr::map_chr(dot_cols, rlang::as_string)
dat %>%
dplyr::select({{study_col}}, {{group_col}}, {{outcome_col}}, !!! dot_cols) %>%
dplyr::group_by({{study_col}}) %>%
dplyr::mutate(grp = across(all_of(str_cols), ~ n_distinct(.) == 1) %>%
purrr::reduce(stringr::str_c, collapse=""),
grp = stringr::str_c(n_distinct({{group_col}}) == 1,
n_distinct({{outcome_col}}) == 1, grp)) %>%
dplyr::ungroup(.) %>%
dplyr::group_split(grp, .keep = FALSE)
}
-测试
> out <- foo2(h, case, group, outcome, sample, control)
> out
<list_of<
tbl_df<
case : integer
group : integer
outcome: integer
sample : integer
control: integer
>
>[14]>
[[1]]
# A tibble: 16 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 9 1 1 1 1
2 9 1 1 2 1
3 9 2 1 1 1
4 9 2 1 2 1
5 9 1 2 1 1
6 9 1 2 2 1
7 9 2 2 1 1
8 9 2 2 2 1
9 9 1 1 1 2
10 9 1 1 2 2
11 9 2 1 1 2
12 9 2 1 2 2
13 9 1 2 1 2
14 9 1 2 2 2
15 9 2 2 1 2
16 9 2 2 2 2
[[2]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 1 1 1 1 1
2 1 1 1 2 1
3 1 2 1 1 1
4 1 2 1 2 1
5 1 1 2 1 1
6 1 1 2 2 1
7 1 2 2 1 1
8 1 2 2 2 1
[[3]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 10 1 1 1 1
2 10 2 1 1 1
3 10 1 2 1 1
4 10 2 2 1 1
5 10 1 1 1 2
6 10 2 1 1 2
7 10 1 2 1 2
8 10 2 2 1 2
[[4]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 2 1 1 1 1
2 2 2 1 1 1
3 2 1 2 1 1
4 2 2 2 1 1
[[5]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 12 1 1 1 1
2 12 1 1 2 1
3 12 2 1 1 1
4 12 2 1 2 1
5 12 1 1 1 2
6 12 1 1 2 2
7 12 2 1 1 2
8 12 2 1 2 2
[[6]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 4 1 1 1 1
2 4 1 1 2 1
3 4 2 1 1 1
4 4 2 1 2 1
[[7]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 13 1 1 1 1
2 13 2 1 1 1
3 13 1 1 1 2
4 13 2 1 1 2
5 14 1 1 1 1
6 14 2 1 1 1
7 14 1 1 1 2
8 14 2 1 1 2
[[8]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 5 1 1 1 1
2 5 2 1 1 1
3 6 1 1 1 1
4 6 2 1 1 1
[[9]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 11 1 1 1 1
2 11 1 1 2 1
3 11 1 2 1 1
4 11 1 2 2 1
5 11 1 1 1 2
6 11 1 1 2 2
7 11 1 2 1 2
8 11 1 2 2 2
[[10]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 3 1 1 1 1
2 3 1 1 2 1
3 3 1 2 1 1
4 3 1 2 2 1
[[11]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 15 1 1 1 1
2 15 1 1 2 1
3 15 1 1 1 2
4 15 1 1 2 2
[[12]]
# A tibble: 2 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 7 1 1 1 1
2 7 1 1 2 1
[[13]]
# A tibble: 2 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 16 1 1 1 1
2 16 1 1 1 2
[[14]]
# A tibble: 1 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 8 1 1 1 1
测试总行数
> sum(map_dbl(out, nrow))
[1] 81
> nrow(h)
[1] 81
如果我们想删除一些参数
If we want to remove some arguments
foo2 <- function(dat, study_col, ...) {
dot_cols <- ensyms(...)
str_cols <- purrr::map_chr(dot_cols, rlang::as_string)
dat %>%
dplyr::select({{study_col}}, !!! dot_cols) %>%
dplyr::group_by({{study_col}}) %>%
dplyr::mutate(grp = across(all_of(str_cols), ~ n_distinct(.) == 1) %>%
purrr::reduce(stringr::str_c, collapse="")) %>%
dplyr::ungroup(.) %>%
dplyr::group_split(grp, .keep = FALSE)
}
-测试
> foo2(h, case, group, outcome, sample, control)
<list_of<
tbl_df<
case : integer
group : integer
outcome: integer
sample : integer
control: integer
>
>[14]>
[[1]]
# A tibble: 16 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 9 1 1 1 1
2 9 1 1 2 1
3 9 2 1 1 1
4 9 2 1 2 1
5 9 1 2 1 1
6 9 1 2 2 1
7 9 2 2 1 1
8 9 2 2 2 1
9 9 1 1 1 2
10 9 1 1 2 2
11 9 2 1 1 2
12 9 2 1 2 2
13 9 1 2 1 2
14 9 1 2 2 2
15 9 2 2 1 2
16 9 2 2 2 2
[[2]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 1 1 1 1 1
2 1 1 1 2 1
3 1 2 1 1 1
4 1 2 1 2 1
5 1 1 2 1 1
6 1 1 2 2 1
7 1 2 2 1 1
8 1 2 2 2 1
[[3]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 10 1 1 1 1
2 10 2 1 1 1
3 10 1 2 1 1
4 10 2 2 1 1
5 10 1 1 1 2
6 10 2 1 1 2
7 10 1 2 1 2
8 10 2 2 1 2
[[4]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 2 1 1 1 1
2 2 2 1 1 1
3 2 1 2 1 1
4 2 2 2 1 1
[[5]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 12 1 1 1 1
2 12 1 1 2 1
3 12 2 1 1 1
4 12 2 1 2 1
5 12 1 1 1 2
6 12 1 1 2 2
7 12 2 1 1 2
8 12 2 1 2 2
[[6]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 4 1 1 1 1
2 4 1 1 2 1
3 4 2 1 1 1
4 4 2 1 2 1
[[7]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 13 1 1 1 1
2 13 2 1 1 1
3 13 1 1 1 2
4 13 2 1 1 2
5 14 1 1 1 1
6 14 2 1 1 1
7 14 1 1 1 2
8 14 2 1 1 2
[[8]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 5 1 1 1 1
2 5 2 1 1 1
3 6 1 1 1 1
4 6 2 1 1 1
[[9]]
# A tibble: 8 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 11 1 1 1 1
2 11 1 1 2 1
3 11 1 2 1 1
4 11 1 2 2 1
5 11 1 1 1 2
6 11 1 1 2 2
7 11 1 2 1 2
8 11 1 2 2 2
[[10]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 3 1 1 1 1
2 3 1 1 2 1
3 3 1 2 1 1
4 3 1 2 2 1
[[11]]
# A tibble: 4 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 15 1 1 1 1
2 15 1 1 2 1
3 15 1 1 1 2
4 15 1 1 2 2
[[12]]
# A tibble: 2 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 7 1 1 1 1
2 7 1 1 2 1
[[13]]
# A tibble: 2 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 16 1 1 1 1
2 16 1 1 1 2
[[14]]
# A tibble: 1 x 5
case group outcome sample control
<int> <int> <int> <int> <int>
1 8 1 1 1 1
这篇关于后续:泛化 data.frame 子集功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!