基于2个数据帧计算列值 [英] Calculate column value based on 2 dataframes
问题描述
我的第一个问题是确定是否正确,如果我想在两个数据框上进行计算,相同的索引将一起计算?
我的第二个问题是,我想在df1中的Date和df2的一个日期之间做一个区别,如下所示: / p>
df1:
Date1
L-22 2015- 03-12
L-15 2016-02-26
df2:
Date2 Date3
L-15 2016-01-11 NaT
L-22 NaT 2017-01-08 $ b我做了这样的事情,它给出了错误,('NaTType'对象没有属性'notnull')$ b
<
for d in df1.index:
如果df2 ['Date2']。ix [i] .notnull ):
df1 ['Days_diff'] = df2 ['Date2']。sub(df1(train ['Date1'],axis = 0))
elif df2 ['Date3']。ix [i] .notnull():
df1 ['Days_diff'] = df3 ['Date3']。sub(df1(train ['Date1'],axis = )
任何想法?谢谢!
我想你需要 combine_first
替换 NaN
到列之间的值:
dates = df2.Date2.combine_first(df2.Date3)
#替代解决方案
#dates = df2.Date2.fillna(df2.Date3)
打印(日期)
L-15 2016-01-11
L-22 2017-01-08
名称:Date2,dtype:datetime64 [ns]
然后减法值:
df1 ['Days_diff'] = dates.sub(df1 ['Date1'],axis = 0)
打印(df1)
日期1 Days_diff
L-22 2015-03-12 668天
L-15 2016-02-26 -46天
另一个解决方案是使用条件
,但似乎输出相同:
date2 = df2 ['Date2']。其中(df2 ['Date2']。notnul (df2 ['Date3'] notnull())。sub(df1 ['Date1'],axis = 0)
date3 = df2 ['Date3']。 Date1'],轴= 0)
打印(date2)
L-15 -46天
L-22 NaT
dtype:timedelta64 [ns]
print(date3)
L-15 NaT
L-22 668天
dtype:timedelta64 [ns]
df1 ['Days_diff'] = date2。 combine_first(date3)
print(df1)
Date1 Days_diff
L-22 2015-03-12 668天
L-15 2016-02-26 -46天
I have 2 data frames, one has a Date column and other has 2 Dates column. Both has same index which is an ID.
My first question is to be sure if I am right, if I want to compute on both dataframes, the rows that has the same index will compute together ?
My second question is, I want to do a difference between the Date in df1 and one of the dates of the df2 like the following:
df1:
Date1
L-22 2015-03-12
L-15 2016-02-26
df2:
Date2 Date3
L-15 2016-01-11 NaT
L-22 NaT 2017-01-08
I did something like this, and it gives error, ('NaTType' object has no attribute 'notnull')
for i in df1.index:
if df2['Date2'].ix[i].notnull():
df1['Days_diff'] = df2['Date2'].sub(df1(train['Date1'], axis=0))
elif df2['Date3'].ix[i].notnull():
df1['Days_diff'] =df3['Date3'].sub(df1(train['Date1'], axis=0))
Any ideas ? Thank you!
I think you need combine_first
for replace NaN
to values between columns:
dates = df2.Date2.combine_first(df2.Date3)
#alternative solution
#dates = df2.Date2.fillna(df2.Date3)
print (dates)
L-15 2016-01-11
L-22 2017-01-08
Name: Date2, dtype: datetime64[ns]
and then substract values:
df1['Days_diff'] = dates.sub(df1['Date1'], axis=0)
print (df1)
Date1 Days_diff
L-22 2015-03-12 668 days
L-15 2016-02-26 -46 days
Another solution is use conditions
, but it seems output is same:
date2 = df2['Date2'].where(df2['Date2'].notnull()).sub(df1['Date1'], axis=0)
date3 = df2['Date3'].where(df2['Date3'].notnull()).sub(df1['Date1'], axis=0)
print (date2)
L-15 -46 days
L-22 NaT
dtype: timedelta64[ns]
print (date3)
L-15 NaT
L-22 668 days
dtype: timedelta64[ns]
df1['Days_diff'] = date2.combine_first(date3)
print (df1)
Date1 Days_diff
L-22 2015-03-12 668 days
L-15 2016-02-26 -46 days
这篇关于基于2个数据帧计算列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!