使用dplyr进行group_by并仅使用if(不包含else)语句进行条件变异 [英] Using dplyr to group_by and conditionally mutate only with if (without else) statement
问题描述
我有一个数据框,我需要按列条目进行分组,以便仅使用if语句(不包含else条件)有条件地对几个列进行突变。
I have a dataframe that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement (without an else condition).
更具体地说,如果某个组的列值超过预定义的阈值,我想对其求和,否则这些值应保持不变。
More specifically, I want to sum up the column values of a certain group if they cross a pre-defined threshold, otherwise the values should remain unchanged.
我有尝试同时使用 if_else
和 case_when
来执行此操作,但是这些函数需要使用 false参数( if_else
)或默认情况下设置与NA不匹配的值( case_when
):
I have tried doing this using both if_else
and case_when
but these functions require either a "false" argument (if_else
) or by default set values that are not matched to NA (case_when
):
iris_mutated <- iris %>%
dplyr::group_by(Species) %>%
dplyr::mutate(Sepal.Length=if_else(sum(Sepal.Length)>250, sum(Sepal.Length)),
Sepal.Width=if_else(sum(Sepal.Width)>170, sum(Sepal.Width)),
Petal.Length=if_else(sum(Petal.Length)>70, sum(Petal.Length)),
Petal.Width=if_else(sum(Petal.Width)>15, sum(Petal.Width)))
iris_mutated <- iris %>%
dplyr::group_by(Species) %>%
dplyr::mutate(Sepal.Length=case_when(sum(Sepal.Length)>250 ~ sum(Sepal.Length)),
Sepal.Width=case_when(sum(Sepal.Width)>170 ~ sum(Sepal.Width)),
Petal.Length=case_when(sum(Petal.Length)>70 ~ sum(Petal.Length)),
Petal.Width=case_when(sum(Petal.Width)>15 ~ sum(Petal.Width)))
有任何想法如何执行此操作吗?
Any ideas how to do this instead?
编辑:
这里是一个预期输出的示例。
所有物种分组条目的花瓣宽度总和为 setosa 12.3, virginica 101.3和 versicolor 66.3 。如果我要求此总和至少应为15以求和(否则应保留原始值),那么我希望得到以下输出(仅显示 Petal.Width和 Species列):
Here is an example for the expected output. The sum of the petal width for all species-wise grouped entries is 12.3 for setosa, 101.3 for virginica and 66.3 for versicolor. If I require that this sum should be at least 15 for the values to be summed up (otherwise the original value should be kept), then I expect the following output (only showing the columns "Petal.Width" and "Species"):
Petal.Width Species
1 0.2 setosa
2 0.2 setosa
3 0.2 setosa
4 0.2 setosa
5 0.2 setosa
6 0.4 setosa
7 0.3 setosa
8 0.2 setosa
9 0.2 setosa
10 0.1 setosa
#...#
50 0.2 setosa
51 66.3 versicolor
52 66.3 versicolor
53 66.3 versicolor
#...#
100 66.3 versicolor
101 101.3 virginica
102 101.3 virginica
103 101.3 virginica
#...#
150 101.3 virginica
推荐答案
我认为您在船尾这个吗?使用约翰尼的方法。当总和不大于临界值时,当您将原始值用作case_的一部分时,您应该不会出错...
I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...
iris_mutated <- iris %>%
group_by(Species) %>%
mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
T ~ Sepal.Length),
Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
T ~ Sepal.Width),
Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
T ~ Petal.Length),
Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
T ~ Petal.Width))
这篇关于使用dplyr进行group_by并仅使用if(不包含else)语句进行条件变异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!