基于R数据帧中的两个条件进行突变 [英] Mutate based on two conditions in R dataframe
问题描述
我有一个R数据框,可以从下面的代码中生成
I have a R dataframe which can be generated from the code below
DF <- data.frame("Person_id" = c(1,1,1,1,2,2,2,2,3,3), "Type" = c("IN","OUT","IN","ANC","IN","OUT","IN","ANC","EM","ANC"), "Name" = c("Nara","Nara","Nara","Nara","Dora","Dora","Dora","Dora","Sara","Sara"),"day_1" = c("21/1/2002","21/4/2002","21/6/2002","21/9/2002","28/1/2012","28/4/2012","28/6/2012","28/9/2012","30/06/2004","30/06/2005"),"day_2" = c("23/1/2002","21/4/2002","","","30/1/2012","28/4/2012","","28/9/2012","",""))
我想做的是根据下面给出的一些条件创建两个新列,分别为admit_start_date
和admit_end_date
What I would like to do is create two new columns as admit_start_date
and admit_end_date
based on few conditions which are given below
规则1
admit_start_date = day_1
admit_end_date = day_2 (sometimes day_2 can be NA. So refer Rule 2 below)
规则2
if day_2 is (null or blank or na) and Type is (Out or ANC or EM) then
admit_end_date = day_1
else (if Type is IN)
admit_end_date = day_1 + 5 (days)
这是我正在尝试的方法,但似乎无济于事
This is what I am trying but doesn't seem to help
transform_dates = function(DF){ # this function is to create 'date' columns
DF %>%
mutate(admit_start_date = day_1) %>%
mutate(admit_end_date = day_2) %>%
admit_end_date = if_else(((Type == 'Out' & admit_end_date.isna() ==True|Type == 'ANC' & admit_end_date.isna() ==True|Type == 'EM' & admit_end_date.isna() ==True),day_1,day_1 + 5)
)
}
如您所见,我不确定如何检查新创建的列的NA
并将基于类型列的NAs
替换为day_1
或day_1 + 5(days)
.
As you can see, I am not sure how to check for NA
for a newly created column and replace those NAs
with day_1
or day_1 + 5(days)
based on Type column.
可以帮忙吗?
我希望我的输出如下所示
I expect my output to be like as shown below
推荐答案
在将"day"
列转换为实际日期对象之后,我们可以使用case_when
分别指定每个条件.
We can use case_when
to specify each condition separately after converting "day"
columns to actual date objects.
library(dplyr)
DF %>%
mutate_at(vars(starts_with('day')), as.Date, "%d/%m/%Y") %>%
mutate(admit_start_date = day_1,
admit_end_date = case_when(
!is.na(day_2) ~day_2,
is.na(day_2) & Type %in% c('OUT', 'ANC', 'EM') ~ day_1,
Type == 'IN' ~ day_1 + 5))
# Person_id Type Name day_1 day_2 admit_start_date admit_end_date
#1 1 IN Nara 2002-01-21 2002-01-23 2002-01-21 2002-01-23
#2 1 OUT Nara 2002-04-21 2002-04-21 2002-04-21 2002-04-21
#3 1 IN Nara 2002-06-21 <NA> 2002-06-21 2002-06-26
#4 1 ANC Nara 2002-09-21 <NA> 2002-09-21 2002-09-21
#5 2 IN Dora 2012-01-28 2012-01-30 2012-01-28 2012-01-30
#6 2 OUT Dora 2012-04-28 2012-04-28 2012-04-28 2012-04-28
#7 2 IN Dora 2012-06-28 <NA> 2012-06-28 2012-07-03
#8 2 ANC Dora 2012-09-28 2012-09-28 2012-09-28 2012-09-28
#9 3 EM Sara 2004-06-30 <NA> 2004-06-30 2004-06-30
#10 3 ANC Sara 2005-06-30 <NA> 2005-06-30 2005-06-30
数据框中的日期不是日期"类(class(DF$day_1)
),使用mutate_at
我们将其类别更改为日期",以便可以对其进行数学计算. starts_with('day')
表示任何名称以"day"
开头的列都将转换为"Date"类.当我们想将相同的函数应用于多个列时,我们使用mutate_at
.
The dates in the dataframe are not of class "Date", (class(DF$day_1)
), using mutate_at
we change their class to "Date" so we can perform mathematical calculations on it. starts_with('day')
means that any column whose name starts with "day"
would be converted to "Date" class. We use mutate_at
when we want to apply the same function to multiple columns.
case_when
是嵌套ifelse
语句的替代方法.它们按顺序执行.因此,检查第一个条件,如果满足条件,则不检查其余条件.如果不满足第一个条件,则检查第二个条件,依此类推.因此,此处不需要else
.如果不满足任何条件,则返回NA
.检查?case_when
.
case_when
is an alternative to nested ifelse
statements. They execute in sequential order. So first condition is checked, if the condition is satisfied it doesn't check the remaining conditions. If the first condition is not satisfied, it checks for the second condition and so on. Hence, no else
is required here. If none of the conditions are satisfied it returns NA
. Check ?case_when
.
这篇关于基于R数据帧中的两个条件进行突变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!