pandas :merge_asof()相加多行/不重复 [英] Pandas: merge_asof() sum multiple rows / don't duplicate

查看:173
本文介绍了 pandas :merge_asof()相加多行/不重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用两个数据集,每个数据集都有不同的日期.我想合并它们,但是由于日期不完全匹配,因此我相信merge_asof()是最好的选择.

I'm working with two data sets that have different dates associated with each. I want to merge them, but because the dates are not exact matches, I believe merge_asof() is the best way to go.

但是,merge_asof()会发生两种不理想的情况:

However, two things happen with a merge_asof() that are not ideal:

  1. 数字重复.
  2. 号码丢失了.

以下代码为示例:

df_a = pd.DataFrame({'date':pd.to_datetime(['1/15/2016','3/15/2016','5/15/2016','7/15/2016'])})
df_b = pd.DataFrame({'date':pd.to_datetime(['1/1/2016','4/1/2016','5/1/2016','6/1/2016','7/1/2016']), 'num':[1,10,100,1000,10000]})

df_x = pd.merge_asof(df_a, df_b, on = 'date')

这将产生:

        date    num
0 2016-01-15      1
1 2016-03-15      1
2 2016-05-15    100
3 2016-07-15  10000

但是我想要:

        date    num
0 2016-01-15      1
1 2016-03-15      0
2 2016-05-15    110
3 2016-07-15  11000

...在其中添加了介于日期之间的多行的集合,而不仅仅是选择了最接近的行.

...where sets of multiple rows that fall between dates are added up, and it isn't just that closest row that is chosen.

使用merge_asof()是否可行?还是应该寻找其他解决方案?

Is that possible with merge_asof() or should I look for another solution?

推荐答案

感谢发布此问题.它促使我花了几个小时学习merge_asof资料.我认为您的解决方案不能得到很大的改进,但是我会做一些调整以将其速度提高几个百分点.

Thanks for posting this question. It prompted me to spend an educational couple of hours studying the merge_asof source. I do not think that your solution can be improved considerably, but I would offer a couple of tweaks to speed it up a few percent.

# if we concat the original date vector, we will only need to merge once
df_ax = pd.concat([df_a, df_a.rename(columns={'date':'date1'})], axis=1)

# do the outer merge
df_m = pd.merge(df_ax, df_b, on='date', how='outer').sort_values(by='date')

# do a single rename, inplace
df_m.rename(columns={'date': 'datex', 'date1': 'date'}, inplace=True)

# fill the gaps to allow the groupby and sum
df_m['num'].fillna(0, inplace=True)
df_m['date'].fillna(method='bfill', inplace=True)

# roll up the results.
x = df_m.groupby('date').num.sum().reset_index()

这篇关于 pandas :merge_asof()相加多行/不重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆