使用dplyr进行group_by并仅使用if(不包含else)语句进行条件变异 [英] Using dplyr to group_by and conditionally mutate only with if (without else) statement

查看:204
本文介绍了使用dplyr进行group_by并仅使用if(不包含else)语句进行条件变异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我需要按列条目进行分组,以便仅使用if语句(不包含else条件)有条件地对几个列进行突变。

I have a dataframe that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement (without an else condition).

更具体地说,如果某个组的列值超过预定义的阈值,我想对其求和,否则这些值应保持不变。

More specifically, I want to sum up the column values of a certain group if they cross a pre-defined threshold, otherwise the values should remain unchanged.

我有尝试同时使用 if_else case_when 来执行此操作,但是这些函数需要使用 false参数( if_else )或默认情况下设置与NA不匹配的值( case_when ):

I have tried doing this using both if_else and case_when but these functions require either a "false" argument (if_else) or by default set values that are not matched to NA (case_when):

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=if_else(sum(Sepal.Length)>250, sum(Sepal.Length)),
                Sepal.Width=if_else(sum(Sepal.Width)>170, sum(Sepal.Width)),
                Petal.Length=if_else(sum(Petal.Length)>70, sum(Petal.Length)),
                Petal.Width=if_else(sum(Petal.Width)>15, sum(Petal.Width)))

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=case_when(sum(Sepal.Length)>250 ~ sum(Sepal.Length)),
                Sepal.Width=case_when(sum(Sepal.Width)>170 ~ sum(Sepal.Width)),
                Petal.Length=case_when(sum(Petal.Length)>70 ~ sum(Petal.Length)),
                Petal.Width=case_when(sum(Petal.Width)>15 ~ sum(Petal.Width)))

有任何想法如何执行此操作吗?

Any ideas how to do this instead?

编辑:

这里是一个预期输出的示例。
所有物种分组条目的花瓣宽度总和为 setosa 12.3, virginica 101.3和 versicolor 66.3 。如果我要求此总和至少应为15以求和(否则应保留原始值),那么我希望得到以下输出(仅显示 Petal.Width和 Species列):

Here is an example for the expected output. The sum of the petal width for all species-wise grouped entries is 12.3 for setosa, 101.3 for virginica and 66.3 for versicolor. If I require that this sum should be at least 15 for the values to be summed up (otherwise the original value should be kept), then I expect the following output (only showing the columns "Petal.Width" and "Species"):

Petal.Width    Species
1           0.2     setosa
2           0.2     setosa
3           0.2     setosa
4           0.2     setosa
5           0.2     setosa
6           0.4     setosa
7           0.3     setosa
8           0.2     setosa
9           0.2     setosa
10          0.1     setosa
#...#
50          0.2     setosa
51          66.3 versicolor
52          66.3 versicolor
53          66.3 versicolor
#...#
100         66.3 versicolor
101         101.3  virginica
102         101.3  virginica
103         101.3  virginica
#...#
150         101.3  virginica


推荐答案

我认为您在船尾这个吗?使用约翰尼的方法。当总和不大于临界值时,当您将原始值用作case_的一部分时,您应该不会出错...

I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...

iris_mutated <- iris %>% 
  group_by(Species) %>% 
  mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
                                   T ~ Sepal.Length),
         Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
                                   T ~ Sepal.Width),
         Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
                                   T ~ Petal.Length),
         Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
                                   T ~ Petal.Width))

这篇关于使用dplyr进行group_by并仅使用if(不包含else)语句进行条件变异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆