基于R数据帧中的两个条件进行突变 [英] Mutate based on two conditions in R dataframe

查看:81
本文介绍了基于R数据帧中的两个条件进行突变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个R数据框,可以从下面的代码中生成

I have a R dataframe which can be generated from the code below

DF <- data.frame("Person_id" = c(1,1,1,1,2,2,2,2,3,3), "Type" = c("IN","OUT","IN","ANC","IN","OUT","IN","ANC","EM","ANC"), "Name" = c("Nara","Nara","Nara","Nara","Dora","Dora","Dora","Dora","Sara","Sara"),"day_1" = c("21/1/2002","21/4/2002","21/6/2002","21/9/2002","28/1/2012","28/4/2012","28/6/2012","28/9/2012","30/06/2004","30/06/2005"),"day_2" = c("23/1/2002","21/4/2002","","","30/1/2012","28/4/2012","","28/9/2012","",""))

我想做的是根据下面给出的一些条件创建两个新列,分别为admit_start_dateadmit_end_date

What I would like to do is create two new columns as admit_start_date and admit_end_date based on few conditions which are given below

规则1

  admit_start_date = day_1
  admit_end_date   = day_2 (sometimes day_2 can be NA. So refer Rule 2 below)

规则2

   if day_2 is (null or blank or na) and Type is (Out or ANC or EM) then 
         admit_end_date = day_1 
   else (if Type is IN)
         admit_end_date = day_1 + 5 (days)

这是我正在尝试的方法,但似乎无济于事

This is what I am trying but doesn't seem to help

    transform_dates = function(DF){  # this function is to create 'date' columns  
  DF %>% 
    mutate(admit_start_date = day_1) %>% 
    mutate(admit_end_date = day_2) %>%
    admit_end_date = if_else(((Type == 'Out' & admit_end_date.isna() ==True|Type == 'ANC' & admit_end_date.isna() ==True|Type == 'EM' & admit_end_date.isna() ==True),day_1,day_1 + 5)
    )
}  

如您所见,我不确定如何检查新创建的列的NA并将基于类型列的NAs替换为day_1day_1 + 5(days).

As you can see, I am not sure how to check for NA for a newly created column and replace those NAs with day_1 or day_1 + 5(days) based on Type column.

可以帮忙吗?

我希望我的输出如下所示

I expect my output to be like as shown below

推荐答案

在将"day"列转换为实际日期对象之后,我们可以使用case_when分别指定每个条件.

We can use case_when to specify each condition separately after converting "day" columns to actual date objects.

library(dplyr)

DF %>%
  mutate_at(vars(starts_with('day')), as.Date, "%d/%m/%Y") %>%
  mutate(admit_start_date = day_1, 
         admit_end_date = case_when(
         !is.na(day_2) ~day_2,
         is.na(day_2) & Type %in% c('OUT', 'ANC', 'EM') ~ day_1, 
         Type == 'IN' ~ day_1 + 5))


#  Person_id Type Name      day_1      day_2 admit_start_date admit_end_date
#1          1   IN Nara 2002-01-21 2002-01-23       2002-01-21     2002-01-23
#2          1  OUT Nara 2002-04-21 2002-04-21       2002-04-21     2002-04-21
#3          1   IN Nara 2002-06-21       <NA>       2002-06-21     2002-06-26
#4          1  ANC Nara 2002-09-21       <NA>       2002-09-21     2002-09-21
#5          2   IN Dora 2012-01-28 2012-01-30       2012-01-28     2012-01-30
#6          2  OUT Dora 2012-04-28 2012-04-28       2012-04-28     2012-04-28
#7          2   IN Dora 2012-06-28       <NA>       2012-06-28     2012-07-03
#8          2  ANC Dora 2012-09-28 2012-09-28       2012-09-28     2012-09-28
#9          3   EM Sara 2004-06-30       <NA>       2004-06-30     2004-06-30
#10         3  ANC Sara 2005-06-30       <NA>       2005-06-30     2005-06-30

数据框中的日期不是日期"类(class(DF$day_1)),使用mutate_at我们将其类别更改为日期",以便可以对其进行数学计算. starts_with('day')表示任何名称以"day"开头的列都将转换为"Date"类.当我们想将相同的函数应用于多个列时,我们使用mutate_at.

The dates in the dataframe are not of class "Date", (class(DF$day_1)), using mutate_at we change their class to "Date" so we can perform mathematical calculations on it. starts_with('day') means that any column whose name starts with "day" would be converted to "Date" class. We use mutate_at when we want to apply the same function to multiple columns.

case_when是嵌套ifelse语句的替代方法.它们按顺序执行.因此,检查第一个条件,如果满足条件,则不检查其余条件.如果不满足第一个条件,则检查第二个条件,依此类推.因此,此处不需要else.如果不满足任何条件,则返回NA.检查?case_when.

case_when is an alternative to nested ifelse statements. They execute in sequential order. So first condition is checked, if the condition is satisfied it doesn't check the remaining conditions. If the first condition is not satisfied, it checks for the second condition and so on. Hence, no else is required here. If none of the conditions are satisfied it returns NA. Check ?case_when.

这篇关于基于R数据帧中的两个条件进行突变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆