基于2个数据帧计算列值 [英] Calculate column value based on 2 dataframes

查看:468
本文介绍了基于2个数据帧计算列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据帧,一个有一个Date列,另一个有2个数据帧。两者都有相同的索引,这是一个ID。



我的第一个问题是确定是否正确,如果我想在两个数据框上进行计算,相同的索引将一起计算?



我的第二个问题是,我想在df1中的Date和df2的一个日期之间做一个区别,如下所示: / p>

df1:

  Date1 
L-22 2015- 03-12
L-15 2016-02-26

df2:

  Date2 Date3 
L-15 2016-01-11 NaT
L-22 NaT 2017-01-08 $ b我做了这样的事情,它给出了错误,('NaTType'对象没有属性'notnull')$ b



<

  for d in df1.index:
如果df2 ['Date2']。ix [i] .notnull ):
df1 ['Days_diff'] = df2 ['Date2']。sub(df1(train ['Date1'],axis = 0))
elif df2 ['Date3']。ix [i] .notnull():
df1 ['Days_diff'] = df3 ['Date3']。sub(df1(train ['Date1'],axis = )

任何想法?谢谢!

解决方案

我想你需要 combine_first 替换 NaN 到列之间的值:

  dates = df2.Date2.combine_first(df2.Date3)
#替代解决方案
#dates = df2.Date2.fillna(df2.Date3)

打印(日期)
L-15 2016-01-11
L-22 2017-01-08
名称:Date2,dtype:datetime64 [ns]

然后减法值:

  df1 ['Days_diff'] = dates.sub(df1 ['Date1'],axis = 0)
打印(df1)

日期1 Days_diff
L-22 2015-03-12 668天
L-15 2016-02-26 -46天

另一个解决方案是使用条件,但似乎输出相同:

  date2 = df2 ['Date2']。其中(df2 ['Date2']。notnul (df2 ['Date3'] notnull())。sub(df1 ['Date1'],axis = 0)
date3 = df2 ['Date3']。 Date1'],轴= 0)
打印(date2)
L-15 -46天
L-22 NaT
dtype:timedelta64 [ns]

print(date3)
L-15 NaT
L-22 668天
dtype:timedelta64 [ns]

df1 ['Days_diff'] = date2。 combine_first(date3)
print(df1)
Date1 Days_diff
L-22 2015-03-12 668天
L-15 2016-02-26 -46天


I have 2 data frames, one has a Date column and other has 2 Dates column. Both has same index which is an ID.

My first question is to be sure if I am right, if I want to compute on both dataframes, the rows that has the same index will compute together ?

My second question is, I want to do a difference between the Date in df1 and one of the dates of the df2 like the following:

df1:

            Date1
 L-22     2015-03-12 
 L-15     2016-02-26

df2:

            Date2              Date3
 L-15     2016-01-11             NaT
 L-22        NaT              2017-01-08

I did something like this, and it gives error, ('NaTType' object has no attribute 'notnull')

      for i in df1.index:
         if df2['Date2'].ix[i].notnull():
            df1['Days_diff'] = df2['Date2'].sub(df1(train['Date1'], axis=0))
         elif df2['Date3'].ix[i].notnull():
            df1['Days_diff'] =df3['Date3'].sub(df1(train['Date1'], axis=0))

Any ideas ? Thank you!

解决方案

I think you need combine_first for replace NaN to values between columns:

dates = df2.Date2.combine_first(df2.Date3)
#alternative solution
#dates = df2.Date2.fillna(df2.Date3)

print (dates)
L-15   2016-01-11
L-22   2017-01-08
Name: Date2, dtype: datetime64[ns]

and then substract values:

df1['Days_diff'] = dates.sub(df1['Date1'], axis=0)
print (df1)

          Date1  Days_diff
L-22 2015-03-12   668 days
L-15 2016-02-26   -46 days

Another solution is use conditions, but it seems output is same:

date2  = df2['Date2'].where(df2['Date2'].notnull()).sub(df1['Date1'], axis=0)
date3  = df2['Date3'].where(df2['Date3'].notnull()).sub(df1['Date1'], axis=0)
print (date2)
L-15   -46 days
L-22        NaT
dtype: timedelta64[ns]

print (date3)
L-15        NaT
L-22   668 days
dtype: timedelta64[ns]

df1['Days_diff'] = date2.combine_first(date3)
print (df1)
          Date1  Days_diff
L-22 2015-03-12   668 days
L-15 2016-02-26   -46 days

这篇关于基于2个数据帧计算列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆