在r中按日期匹配和替换数据框中列的值 [英] Matching and Replacing values of a column in a dataframe by date in r

查看:152
本文介绍了在r中按日期匹配和替换数据框中列的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

干杯,
我有两个具有以下结构的数据框。

Cheers, I have two data frames with the following structure.

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     150
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

第二个数据帧DF2具有以下结构:

Second data frame DF2 has the following structure:

DF2:
Airlines           HeadQ      Date          
American           ATL        09-30-2016
Delta              ATL        03-31-2017

现在查找数据框DF1和DF2,我想更改DF1到以下数据框。

Now looking up with data frames DF1 and DF2, I would like to alter DF1 to the following data frame.

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

条件是,从DF2查找DF1的航空公司和HeadQ,并且DF1 $ Date< DF2 $ Date然后将Cost_Index设为0,否则继续使用Cost_Index。

The condition is, lookup for Airlines and HeadQ of DF1 from DF2 and if DF1$Date < DF2$Date then make Cost_Index as 0 or else continue with Cost_Index.

我尝试使用

DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ 
        & DF1$Date < DF2$Date, 0, DF1$Cost_Index)


Warning:
1: In DF1$Airlines == DF2$Airlines : longer object
length is not a multiple of shorter object length". 
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a
multiple of shorter object length

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     0
Delta              ATL        10-31-2016     0
American           ATL        08-31-2017     200

任何人都可以指出我向正确的方向前进?

Can anyone point me to right direction?

注意:

str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"


推荐答案

您可以左航空公司和<$上的 DF1 DF2 的联接c $ c> HeadQ 列,然后根据 Date.x Cost_Index 的值$ c>(来自 DF1 的日期)小于 Date.y (来自DF2的日期)。使用 dplyr 语法(还要检查 Date.y 是否为 NA 根据您的逻辑在此处应用,如果DF2中没有匹配项,则 Cost_Index 应该保持不变):

You can do a left join of DF1 and DF2 on the Airlines and HeadQ columns, and then change the value of Cost_Index based on if Date.x(Date from DF1) is smaller than Date.y(Date from DF2). With dplyr syntax (an additional check of if Date.y is NA is applied here according to your logic that if there is no match in DF2, the Cost_Index should remain the same):

library(dplyr)

# convert the Date column to Date class for comparison
df1$Date <- as.Date(df1$Date, "%m-%d-%Y")
df2$Date <- as.Date(df2$Date, "%m-%d-%Y")

df1 %>% 
       left_join(df2, by = c("Airlines", "HeadQ")) %>% 
       mutate(Cost_Index = ifelse(Date.x < Date.y & !is.na(Date.y), 0, Cost_Index), 
              Date = Date.x) %>% 
       select(-Date.x, -Date.y)

#   Airlines HeadQ Cost_Index       Date
# 1 American   PHX        220 2016-07-31
# 2 American   ATL          0 2016-08-31
# 3 American   ATL        150 2016-10-31
# 4    Delta   ATL        180 2016-10-31

这篇关于在r中按日期匹配和替换数据框中列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆