如何根据多列的多个条件创建新列? [英] How do I create a new column based on multiple conditions from multiple columns?
问题描述
我正在尝试根据其他列的几个条件向数据框添加新列。我有以下数据:
I'm trying add a new column to a data frame based on several conditions from other columns. I have the following data:
> commute <- c("walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry")
> kids <- c("Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes")
> distance <- c(1, 12, 5, 25, 7, 2, "", 8, 19, 7, "", 4, 16, 12, 7)
>
> df = data.frame(commute, kids, distance)
> df
commute kids distance
1 walk Yes 1
2 bike Yes 12
3 subway No 5
4 drive No 25
5 ferry Yes 7
6 walk Yes 2
7 bike No
8 subway No 8
9 drive Yes 19
10 ferry Yes 7
11 walk No
12 bike No 4
13 subway Yes 16
14 drive No 12
15 ferry Yes 7
如果满足以下三个条件:
If the following three conditions are met:
commute = walk OR bike OR subway OR ferry
AND
kids = Yes
AND
distance is less than 10
然后我想要一个名为get.flyer的新列等于是。最终数据框应如下所示:
Then I'd like a new column called get.flyer to equal "Yes". The final data frame should look like this:
commute kids distance get.flyer
1 walk Yes 1 Yes
2 bike Yes 12 Yes
3 subway No 5
4 drive No 25
5 ferry Yes 7 Yes
6 walk Yes 2 Yes
7 bike No
8 subway No 8
9 drive Yes 19
10 ferry Yes 7 Yes
11 walk No
12 bike No 4
13 subway Yes 16 Yes
14 drive No 12
15 ferry Yes 7 Yes
推荐答案
我们可以在%中使用%来比较列中的多个元素,
&
来检查两个条件是否都有是真的。
We can use %in%
for comparing multiple elements in a column, &
to check if both conditions are TRUE.
library(dplyr)
df %>%
mutate(get.flyer = c("", "Yes")[(commute %in% c("walk", "bike", "subway", "ferry") &
as.character(kids) == "Yes" &
as.numeric(as.character(distance)) < 10)+1] )
最好是使用 stringsAsFactors = FALSE
创建 data.frame
,因为默认情况下它是 TRUE
。如果我们检查 str(df)
,我们可以发现所有列都是 factor
class。此外,如果缺少值,而不是,可以使用
NA
来避免转换 class
一个数字
列到其他地方。
It is better to create the data.frame
with stringsAsFactors=FALSE
as by default it is TRUE
. If we check the str(df)
, we can find that all the columns are factor
class. Also, if there are missing values, instead of ""
, NA
can be used to avoid converting the class
of a numeric
column to something else.
如果我们改写创建'df'
If we rewrite the creation of 'df'
distance <- c(1, 12, 5, 25, 7, 2, NA, 8, 19, 7, NA, 4, 16, 12, 7)
df1 <- data.frame(commute, kids, distance, stringsAsFactors=FALSE)
以上代码可以简化
df1 %>%
mutate(get.flyer = c("", "Yes")[(commute %in% c("walk", "bike", "subway", "ferry") &
kids == "Yes" &
distance < 10)+1] )
为了更好地理解,有些人更喜欢 ifelse
For better understanding, some people prefer ifelse
df1 %>%
mutate(get.flyer = ifelse(commute %in% c("walk", "bike", "subway", "ferry") &
kids == "Yes" &
distance < 10,
"Yes", ""))
这也可以通过轻松完成base R
methods
This can be also done easily with base R
methods
df1$get.flyer <- with(df1, ifelse(commute %in% c("walk", "bike", "subway", "ferry") &
kids == "Yes" &
distance < 10,
"Yes", ""))
这篇关于如何根据多列的多个条件创建新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!