dplyr 包可以用于条件变异吗? [英] Can dplyr package be used for conditional mutating?

查看:34
本文介绍了dplyr 包可以用于条件变异吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

mutate 是否可以在mutate 有条件的情况下使用(取决于某些列值的值)?

Can the mutate be used when the mutation is conditional (depending on the values of certain column values)?

这个例子有助于说明我的意思.

This example helps showing what I mean.

structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4, 
2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4, 
5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4, 
2, 2, 7, 5, 2)), .Names = c("a", "b", "c", "d", "e", "f"), row.names = c(NA, 
8L), class = "data.frame")

  a b c d e f
1 1 1 6 6 1 2
2 3 3 3 2 2 3
3 4 4 6 4 4 4
4 6 2 5 5 5 2
5 3 6 3 3 6 2
6 2 7 6 7 7 7
7 5 2 5 2 6 5
8 1 6 3 6 3 2

我希望使用 dplyr 包找到我的问题的解决方案(是的,我知道这不是应该工作的代码,但我想它使目的很明确)来创建一个新列 g:

I was hoping to find a solution to my problem using the dplyr package (and yes I know this not code that should work, but I guess it makes the purpose clear) for creating a new column g:

 library(dplyr)
 df <- mutate(df,
         if (a == 2 | a == 5 | a == 7 | (a == 1 & b == 4)){g = 2},
         if (a == 0 | a == 1 | a == 4 | a == 3 |  c == 4) {g = 3})

我正在寻找的代码的结果在这个特定的例子中应该有这个结果:

The result of the code I am looking for should have this result in this particular example:

  a b c d e f  g
1 1 1 6 6 1 2  3
2 3 3 3 2 2 3  3
3 4 4 6 4 4 4  3
4 6 2 5 5 5 2 NA
5 3 6 3 3 6 2 NA
6 2 7 6 7 7 7  2
7 5 2 5 2 6 5  2
8 1 6 3 6 3 2  3

有没有人知道如何在 dplyr 中做到这一点?这个数据框只是一个例子,我正在处理的数据框要大得多.由于它的速度,我尝试使用 dplyr,但也许还有其他更好的方法来处理这个问题?

Does anyone have an idea about how to do this in dplyr? This data frame is just an example, the data frames I am dealing with are much larger. Because of its speed I tried to use dplyr, but perhaps there are other, better ways to handle this problem?

推荐答案

使用 ifelse

df %>%
  mutate(g = ifelse(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4), 2,
               ifelse(a == 0 | a == 1 | a == 4 | a == 3 |  c == 4, 3, NA)))

已添加 - if_else: 请注意,在 dplyr 0.5 中定义了一个 if_else 函数,因此替代方法是将 ifelse 替换为 if_else;但是,请注意,由于 if_elseifelse 更严格(条件的两条腿必须具有相同的类型),因此在这种情况下 NA必须替换为 NA_real_ .

Added - if_else: Note that in dplyr 0.5 there is an if_else function defined so an alternative would be to replace ifelse with if_else; however, note that since if_else is stricter than ifelse (both legs of the condition must have the same type) so the NA in that case would have to be replaced with NA_real_ .

df %>%
  mutate(g = if_else(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4), 2,
               if_else(a == 0 | a == 1 | a == 4 | a == 3 |  c == 4, 3, NA_real_)))

已添加 - case_when 由于发布此问题,dplyr 已添加 case_when,因此另一种选择是:

Added - case_when Since this question was posted dplyr has added case_when so another alternative would be:

df %>% mutate(g = case_when(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4) ~ 2,
                            a == 0 | a == 1 | a == 4 | a == 3 |  c == 4 ~ 3,
                            TRUE ~ NA_real_))

添加-算术/na_if 如果值是数字并且条件(除了最后的默认值NA)是互斥的,就像问题中的情况,那么我们可以使用算术表达式,使每一项都乘以所需的结果,最后使用 na_if 将 0 替换为 NA.

Added - arithmetic/na_if If the values are numeric and the conditions (except for the default value of NA at the end) are mutually exclusive, as is the case in the question, then we can use an arithmetic expression such that each term is multiplied by the desired result using na_if at the end to replace 0 with NA.

df %>%
  mutate(g = 2 * (a == 2 | a == 5 | a == 7 | (a == 1 & b == 4)) +
             3 * (a == 0 | a == 1 | a == 4 | a == 3 |  c == 4),
         g = na_if(g, 0))

这篇关于dplyr 包可以用于条件变异吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆