pandas :在日期时间索引上合并数据帧 [英] Pandas: Merge data frames on datetime index
问题描述
我有以下两个数据框,它们已将日期设置为DateTime索引df.set_index(pd.to_datetime(df['date']), inplace=True)
,并希望在日期上合并或加入:
I have the following two dataframes that I have set date to DateTime Index df.set_index(pd.to_datetime(df['date']), inplace=True)
and would like to merge or join on date:
df.head(5)
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000
d.head(5)
catcode_disp disposition feccandid_disp bills
date
2007-12-31 A0000 support S4HI00011 1
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1
2007-12-31 A1000 support S8MT00010 1
2007-12-31 A1500 support S6WI00061 2
2007-12-31 A1600 support S4IA00020', 'P20000741 3
我尝试了以下两种方法,但均返回MemoryError:
I have tried the following two methods but both return a MemoryError:
df.join(d, how='right')
我在没有将日期设置为索引的数据帧上使用以下代码.
I use the code below on dataframes that dont have date set to index.
merge=pd.merge(df,d, how='inner', on='date')
推荐答案
You can add parameters left_index=True
and right_index=True
if you need merge by indexes in function merge
:
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
样本(d
中的索引的第一个值已更改为匹配):
Sample (first value of index in d
was changed for matching):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
或者您可以使用 concat
:
Or you can use concat
:
print pd.concat([df,d], join='inner', axis=1)
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
EdChum 是正确的:
我将重复项添加到DataFrame df
(索引中的最后2个值):
I add duplicates to DataFrame df
(last 2 values in index):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
2007-12-31 J7120 24E H8OH18088 36
2007-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
2007-12-31 J7120 24E H8OH18088 36 A1000 oppose
2007-12-31 J7120 24E H8OH18088 36 A1000 support
2007-12-31 J7120 24E H8OH18088 36 A1500 support
2007-12-31 J7120 24E H8OH18088 36 A1600 support
2007-12-31 z9600 24K S6ND00058 2000 A1000 oppose
2007-12-31 z9600 24K S6ND00058 2000 A1000 support
2007-12-31 z9600 24K S6ND00058 2000 A1500 support
2007-12-31 z9600 24K S6ND00058 2000 A1600 support
feccandid_disp bills
date
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
这篇关于 pandas :在日期时间索引上合并数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!