按时合并pandas数据框和另一列 [英] Merge pandas dataframe on time and another column

查看:73
本文介绍了按时合并pandas数据框和另一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个熊猫数据框,我试图将它们合并为一个数据框.这是我设置它们的方法:

I have two pandas dataframes that I'm trying to combine into a single dataframe. Here's how I set them up:

a = {'date':['1/1/2015 00:00','1/1/2015 00:15','1/1/2015 00:30'], 'num':[1,2,3]}
b = {'date':['1/1/2015 01:15','1/1/2015 01:30','1/1/2015 01:45'], 'num':[4,5,6]}

dfa = pd.DataFrame(a)
dfb = pd.DataFrame(b)

dfa['date'] = dfa['date'].apply(pd.to_datetime)
dfb['date'] = dfb['date'].apply(pd.to_datetime)

然后我分别从每个时间戳中找到earliestlatest时间戳,并创建一个新的数据帧,该数据帧的开始只是一个date系列:

I then find the earliest and latest time stamps from each, and create a new dataframe that starts as just a date series:

earliest = min(dfa['date'].min(), dfb['date'].min())
latest = max(dfa['date'].max(), dfb['date'].max())

date_range = pd.date_range(earliest, latest, freq='15min')

dfd = pd.DataFrame({'date':date_range})

然后我想将它们全部合并为一个以dfd为基础的数据框,因为它将包含所有适当的时间戳.所以我合并了dfddfa,一切都很好:

I then want to merge them all into a single dataframe with dfd being the base as it will contain all of the proper time stamps. So I merge dfd and dfa and all is good:

dfd = pd.merge(dfd, dfa, how = 'outer', on = 'date')

但是,当我将其与dfb合并时,date系列变得很棘手,我不知道为什么.

However, when I merge it with dfb the date series gets screwy and I can't figure out why.

dfd = pd.merge(dfd, dfb, how = 'outer', on = ['date','num'])

...产量:

                  date  num
0  2015-01-01 00:00:00  1.0
1  2015-01-01 00:15:00  2.0
2  2015-01-01 00:30:00  3.0
3  2015-01-01 00:45:00  NaN
4  2015-01-01 01:00:00  NaN
5  2015-01-01 01:15:00  NaN
6  2015-01-01 01:30:00  NaN
7  2015-01-01 01:45:00  NaN
8  2015-01-01 01:15:00  4.0
9  2015-01-01 01:30:00  5.0
10 2015-01-01 01:45:00  6.0

我希望在4.0中填写2015-01-01 01:15:00时隙等,而不创建新行.

Where I would expect 4.0 to fill in the 2015-01-01 01:15:00 time slot, etc. and not create new rows.

或者,如果我尝试:

dfd = pd.merge(dfd, dfb, how = 'outer', on = 'date')

我得到:

                 date  num_x  num_y
0 2015-01-01 00:00:00    1.0    NaN
1 2015-01-01 00:15:00    2.0    NaN
2 2015-01-01 00:30:00    3.0    NaN
3 2015-01-01 00:45:00    NaN    NaN
4 2015-01-01 01:00:00    NaN    NaN
5 2015-01-01 01:15:00    NaN    4.0
6 2015-01-01 01:30:00    NaN    5.0
7 2015-01-01 01:45:00    NaN    6.0

这也不是我想要的(只需要一个num列).任何帮助将不胜感激.

which is also not what I want (just want a single num column). Any help would be appreciated.

推荐答案

dfa.set_index('date').combine_first(dfb.set_index('date')) \
    .asfreq('15T').reset_index()

                 date    num
0 2015-01-01 00:00:00 1.0000
1 2015-01-01 00:15:00   2.00
2 2015-01-01 00:30:00   3.00
3 2015-01-01 00:45:00    nan
4 2015-01-01 01:00:00    nan
5 2015-01-01 01:15:00   4.00
6 2015-01-01 01:30:00   5.00
7 2015-01-01 01:45:00   6.00


另一种解决方案


another solution

dfa.append(dfb).set_index('date').asfreq('15T').reset_index()

这篇关于按时合并pandas数据框和另一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆