将mutate与条件值组合 [英] Combine mutate with conditional values

查看:189
本文介绍了将mutate与条件值组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在具有四列的大数据帧(myfile)中,我必须基于前四列添加第五列,其值为有条件的。最近我已经成为一个巨大的dplyr粉丝,主要是因为它在大数据集中的速度。所以我想知道我是否可以使用mutate函数处理我的问题。



我的数据框(实际上是一个较短的版本)看起来像这样:

  V1 V2 V3 V4 
1 1 2 3 5
2 2 4 4 1
3 1 4 1 1
4 4 5 1 3
5 5 5 5 4

第五列(V5)的值基于一些条件规则:

  if(V1 == 1& V2!= 4){
V5< - 1
}
else if(V2 == 4& V3!= 1){
V5< - 2
}
else {
V5< - 0
}

我想使用mutate函数在所有行上使用这些规则(所以我不必使用慢循环)。这样的东西(是的,我知道它不会这样工作!):<​​/ p>

  myfile<  -  mutate(myfile如果(V1 == 1& V2!= 4){V5 = 1} 
else if(V2 == 4& V3!= 1){V5 = 2}
else {V5 = 0})

这应该是结果:

  V1 V2 V3 V4 V5 
1 1 2 3 5 1
2 2 4 4 1 2
3 1 4 1 1 0
4 4 5 1 3 0
5 5 5 5 4 0

如何做在 dplyr

解决方案

尝试这样:

 > myfile%>%mutate(V5 =(V1 == 1& V2!= 4)+ 2 *(V2 == 4& V3!= 1))
V1 V2 V3 V4 V5
1 1 2 3 5 1
2 2 4 4 1 2
3 1 4 1 1 0
4 4 5 1 3 0
5 5 5 5 4 0

或此:

 > myfile%>%mutate(V5 = ifelse(V1 == 1& V2!= 4,1,ifelse(V2 == 4& V3!= 1,2,0)))
V1 V2 V3 V4 V5
1 1 2 3 5 1
2 2 4 4 1 2
3 1 4 1 1 0
4 4 5 1 3 0
5 5 5 5 4 0

建议您为数据框获得更好的名称。 myfile使它看起来好像拥有一个文件名。



上面使用了这个输入:

  myfile<  -  
structure(list(V1 = c(1L,2L,1L,4L,5L),V2 = c(2L,4L,4L,
5L,5L ),V3 = c(3L,4L,1L,1L,5L),V4 = c(5L,1L,1L,3L,4L
)).Names = c(V1,V2 V3,V4),class =data.frame,row.names = c(1,
2,3,4,5))

更新由于最初发布的dplyr已更改%。 %>%所以修改了相应的答案。


In a large dataframe ("myfile") with four columns I have to add a fifth column with values conditonally based on the first four columns. Recently I have become a huge fan of dplyr, mainly because of its speed in large datasets. So I was wondering if I could deal with my problem using the mutate function.

My dataframe (actually a shorter version of it) looks a bit like this:

  V1 V2 V3 V4
1  1  2  3  5
2  2  4  4  1
3  1  4  1  1
4  4  5  1  3
5  5  5  5  4

The values of the fifth column (V5) are based on some conditional rules:

if (V1==1 & V2!=4){
V5 <- 1
}
else if (V2==4 & V3!=1){
V5 <- 2
}
else {
V5 <- 0
}

Now I want to use the mutate function to use these rules on all rows (so I don't have to use a slow loop). Something like this (and yes, I know it doesn't work this way!):

myfile <- mutate(myfile, if (V1==1 & V2!=4){V5 = 1}
    else if (V2==4 & V3!=1){V5 = 2}
    else {V5 = 0})

This should be the result:

  V1 V2 V3 V4 V5
1  1  2  3  5  1
2  2  4  4  1  2
3  1  4  1  1  0
4  4  5  1  3  0
5  5  5  5  4  0

How to do this in dplyr?

解决方案

Try this:

> myfile %>% mutate(V5 = (V1 == 1 & V2 != 4) + 2 * (V2 == 4 & V3 != 1))
  V1 V2 V3 V4 V5
1  1  2  3  5  1
2  2  4  4  1  2
3  1  4  1  1  0
4  4  5  1  3  0
5  5  5  5  4  0

or this:

> myfile %>% mutate(V5 = ifelse(V1 == 1 & V2 != 4, 1, ifelse(V2 == 4 & V3 != 1, 2, 0)))
  V1 V2 V3 V4 V5
1  1  2  3  5  1
2  2  4  4  1  2
3  1  4  1  1  0
4  4  5  1  3  0
5  5  5  5  4  0

Suggest you get a better name for your data frame. myfile makes it seem as if it holds a file name.

Above used this input:

myfile <- 
structure(list(V1 = c(1L, 2L, 1L, 4L, 5L), V2 = c(2L, 4L, 4L, 
5L, 5L), V3 = c(3L, 4L, 1L, 1L, 5L), V4 = c(5L, 1L, 1L, 3L, 4L
)), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

Update Since originally posted dplyr has changed %.% to %>% so have modified answer accordingly.

这篇关于将mutate与条件值组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆