通过多个条件匹配和替换数据帧的列 [英] Match and replace columns of dataframe by multiple conditions

查看:91
本文介绍了通过多个条件匹配和替换数据帧的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

干杯,
我有两个数据框架,结构如下:

  DF1:
航空公司HeadQ日期Cost_Index
美国PHX 07-31-2016 220
美式ATL 08-31-2016 150
美式ATL 10-31-2016 150
Delta ATL 10-31-2016 180
美式ATL 08-31-2017 200

第二数据帧DF2具有以下结构:

  DF2:
航空公司头号日期
美式ATL 09-30-2016
Delta ATL 03-31-2017

现在查找数据帧DF1和DF2,我想改变DF1到以下数据框架。

  DF1:
航空公司HeadQ日期Cost_Index
美国PHX 07-31 -2016 220
美国ATL 08-31-2016 0
美国ATL 10-31-2016 150
达美ATL 10-31-2016 180
美式ATL 08-31-2017 200


我尝试使用以下方式失败:

  DF1 $ Cost_Index<  -  ifelse(DF1 $航空公司== DF2 $航空公司和DF1 $ HeadQ == DF2 $ HeadQ 
& DF1 $日期< DF2 $日期,0,DF1 $ Cost_Index)


警告:
1:在DF1 $航空== DF2 $航空:更长的对象
长度不是更短的倍数对象长度
2:In< =。default(DF1 $ Date,DF2 $ Date):较长的对象长度不是较短对象长度的
倍数

DF1:
航空公司HeadQ日期Cost_Index
美国PHX 07-31-2016 220
美式ATL 08-31-2016 0
美式ATL 10-31-2016 0
Delta ATL 10-31-2016 0
美国ATL 08-31-2017 200

任何人都可以点我正确的方向?



注意:

  str(DF1 $ Date):日期格式:2016-10-31
str(DF2 $ Date):日期格式:


解决方案

data.table,v1.9.7的当前开发版本中的条件连接功能,我会请执行以下操作:

  require(data.table)#v1.9.7 
#convert to data.tables,和日期列到Date类。
setDT(df1)[,Date:= as.Date(Date,format =%m-%d-%Y)]
setDT(df2)[,Date:= as.Date日期,格式=%m-%d-%Y)]

df1 [df2,on =。(航空公司,HeadQ,Date< Date),#根据条件$查找匹配行b $ b Cost_Index:= 0L]#更新列为0为这些行

df1
#航空公司HeadQ日期Cost_Index
#1:美国PHX 2016-07-31 220
#2:美式ATL 2016-08-31 0
#3:美式ATL 2016-10-31 150
#4:Delta ATL 2016-10-31 180

您可以按照这些说明


Cheers, I have two data frames with the following structure.

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     150
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

Second data frame DF2 has the following structure:

DF2:
Airlines           HeadQ      Date          
American           ATL        09-30-2016
Delta              ATL        03-31-2017

Now looking up with data frames DF1 and DF2, I would like to alter DF1 to the following data frame.

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

The condition is, lookup for Airlines and HeadQ of DF1 from DF2 and if DF1$Date < DF2$Date then make Cost_Index as 0 or else continue with Cost_Index.

I tried, unsuccessfully, with:

DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ 
        & DF1$Date < DF2$Date, 0, DF1$Cost_Index)


Warning:
1: In DF1$Airlines == DF2$Airlines : longer object
length is not a multiple of shorter object length". 
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a
multiple of shorter object length

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     0
Delta              ATL        10-31-2016     0
American           ATL        08-31-2017     200

Can anyone point me to right direction?

Note:

str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"

解决方案

Using the recently implemented conditional joins feature in the current development version of data.table, v1.9.7, I'd do this as follows:

require(data.table) # v1.9.7
# convert to data.tables, and Date column to Date class.
setDT(df1)[, Date := as.Date(Date, format = "%m-%d-%Y")]
setDT(df2)[, Date := as.Date(Date, format = "%m-%d-%Y")]

df1[df2, on = .(Airlines, HeadQ, Date < Date), # find matching rows based on condition
      Cost_Index := 0L]                        # update column with 0 for those rows

df1
#    Airlines HeadQ       Date Cost_Index
# 1: American   PHX 2016-07-31        220
# 2: American   ATL 2016-08-31          0
# 3: American   ATL 2016-10-31        150
# 4:    Delta   ATL 2016-10-31        180

You can install the dev version by following these instructions.

这篇关于通过多个条件匹配和替换数据帧的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆