使用 dplyr 到 group_by 并按组有条件地改变数据帧 [英] Using dplyr to group_by and conditionally mutate a dataframe by group
问题描述
我想使用 dplyr 函数来 group_by 并有条件地改变 df.鉴于此示例数据:
I'd like to use dplyr functions to group_by and conditionally mutate a df. Given this sample data:
A B C D
1 1 1 0.25
1 1 2 0
1 2 1 0.5
1 2 2 0
1 3 1 0.75
1 3 2 0.25
2 1 1 0
2 1 2 0.5
2 2 1 0
2 2 2 0
2 3 1 0
2 3 2 0
3 1 1 0.5
3 1 2 0
3 2 1 0.25
3 2 2 1
3 3 1 0
3 3 2 0.75
我想使用新列 E 来根据 B == 1、C == 2 和 D > 0 对 A 进行分类.对于所有这些条件都成立的 A 的每个唯一值,则 E = 1,否则 E = 0.因此,输出应如下所示:
I want to use new column E to categorize A by whether B == 1, C == 2, and D > 0. For each unique value of A for which all of these conditions hold true, then E = 1, else E = 0. So, the output should look like this:
A B C D E
1 1 1 0.25 0
1 1 2 0 0
1 2 1 0.5 0
1 2 2 0 0
1 3 1 0.75 0
1 3 2 0.25 0
2 1 1 0 1
2 1 2 0.5 1
2 2 1 0 1
2 2 2 0 1
2 3 1 0 1
2 3 2 0 1
3 1 1 0.5 0
3 1 2 0 0
3 2 1 0.25 0
3 2 2 1 0
3 3 1 0 0
3 3 2 0.75 0
我最初尝试过这段代码,但条件似乎不起作用:
I initially tried this code but the conditionals don't seem to be working right:
foo$E <- foo %>%
group_by(A) %>%
mutate(E = {if (B == 1 & C == 2 & D > 0) 1 else 0})
任何见解表示赞赏.谢谢!
Any insights appreciated. Thanks!
推荐答案
@eipi10 的回答有效.但是,我认为您应该使用 case_when
而不是 ifelse
.它是矢量化的,在更大的数据集上会更快.
@eipi10 's answer works. However, I think you should use case_when
instead of ifelse
. It is vectorised and will be much faster on larger datasets.
foo %>% group_by(A) %>%
mutate(E = case_when(any(B == 1 & C == 2 & D > 0) ~ 1, TRUE ~ 0))
这篇关于使用 dplyr 到 group_by 并按组有条件地改变数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!