在r中按日期匹配和替换数据框中列的值 [英] Matching and Replacing values of a column in a dataframe by date in r
问题描述
干杯,
我有两个具有以下结构的数据框。
Cheers, I have two data frames with the following structure.
DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 150
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180
American ATL 08-31-2017 200
第二个数据帧DF2具有以下结构:
Second data frame DF2 has the following structure:
DF2:
Airlines HeadQ Date
American ATL 09-30-2016
Delta ATL 03-31-2017
现在查找数据框DF1和DF2,我想更改DF1到以下数据框。
Now looking up with data frames DF1 and DF2, I would like to alter DF1 to the following data frame.
DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 0
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180
American ATL 08-31-2017 200
条件是,从DF2查找DF1的航空公司和HeadQ,并且DF1 $ Date< DF2 $ Date然后将Cost_Index设为0,否则继续使用Cost_Index。
The condition is, lookup for Airlines and HeadQ of DF1 from DF2 and if DF1$Date < DF2$Date then make Cost_Index as 0 or else continue with Cost_Index.
我尝试使用
DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ
& DF1$Date < DF2$Date, 0, DF1$Cost_Index)
Warning:
1: In DF1$Airlines == DF2$Airlines : longer object
length is not a multiple of shorter object length".
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a
multiple of shorter object length
DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 0
American ATL 10-31-2016 0
Delta ATL 10-31-2016 0
American ATL 08-31-2017 200
任何人都可以指出我向正确的方向前进?
Can anyone point me to right direction?
注意:
str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"
推荐答案
您可以左航空公司
和<$上的 DF1
和 DF2
的联接c $ c> HeadQ 列,然后根据 Date.x $ c来更改
Cost_Index
的值$ c>(来自 DF1
的日期)小于 Date.y
(来自DF2的日期)。使用 dplyr
语法(还要检查 Date.y
是否为 NA
根据您的逻辑在此处应用,如果DF2中没有匹配项,则 Cost_Index
应该保持不变):
You can do a left join of DF1
and DF2
on the Airlines
and HeadQ
columns, and then change the value of Cost_Index
based on if Date.x
(Date from DF1
) is smaller than Date.y
(Date from DF2). With dplyr
syntax (an additional check of if Date.y
is NA
is applied here according to your logic that if there is no match in DF2, the Cost_Index
should remain the same):
library(dplyr)
# convert the Date column to Date class for comparison
df1$Date <- as.Date(df1$Date, "%m-%d-%Y")
df2$Date <- as.Date(df2$Date, "%m-%d-%Y")
df1 %>%
left_join(df2, by = c("Airlines", "HeadQ")) %>%
mutate(Cost_Index = ifelse(Date.x < Date.y & !is.na(Date.y), 0, Cost_Index),
Date = Date.x) %>%
select(-Date.x, -Date.y)
# Airlines HeadQ Cost_Index Date
# 1 American PHX 220 2016-07-31
# 2 American ATL 0 2016-08-31
# 3 American ATL 150 2016-10-31
# 4 Delta ATL 180 2016-10-31
这篇关于在r中按日期匹配和替换数据框中列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!