如何根据最近的(或最近的)时间戳合并两个数据帧 [英] How to merge two dataframes based on the closest (or most recent) timestamp

查看:98
本文介绍了如何根据最近的(或最近的)时间戳合并两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个数据框df1,其列为'A'和'B'. A是一列时间戳记(例如unixtime),而'B'是一列值.

Suppose I have a dataframe df1, with columns 'A' and 'B'. A is a column of timestamps (e.g. unixtime) and 'B' is a column of some value.

假设我也有一个数据框df2,其列为'C'和'D'. C也是unixtime列,D是包含其他一些值的列.

Suppose I also have a dataframe df2 with columns 'C' and 'D'. C is also a unixtime column and D is a column containing some other values.

我想用timestamp上的联接来模糊merge数据框.但是,如果时间戳不匹配(它们很可能不匹配),我希望它可以在"A"中可以在"C"中找到的时间戳之前最接近的条目上合并.

I would like to fuzzy merge the dataframes with a join on the timestamp. However, if the timestamps don't match (which they most likely don't), I would like it to merge on the closest entry before the timestamp in 'A' that it can find in 'C'.

pd.merge不支持此功能,我发现自己使用to_dict()从数据框转换而来,并使用一些迭代来解决此问题.熊猫有办法解决这个问题吗?

pd.merge does not support this, and I find myself converting away from dataframes using to_dict(), and using some iteration to solve this. Is there a way in pandas to solve this?

推荐答案

numpy.searchsorted()

numpy.searchsorted() finds the appropriate index positions to merge on (see docs) - hope the below get you closer to what you're looking for:

start = datetime(2015, 12, 1)
df1 = pd.DataFrame({'A': [start + timedelta(minutes=randrange(60)) for i in range(10)], 'B': [1] * 10}).sort_values('A').reset_index(drop=True)
df2 = pd.DataFrame({'C': [start + timedelta(minutes=randrange(60)) for i in range(10)], 'D': [2] * 10}).sort_values('C').reset_index(drop=True)
df2.index = np.searchsorted(df1.A.values, df2.C.values)
print(pd.merge(left=df1, right=df2, left_index=True, right_index=True, how='left'))

                    A  B                   C   D
0 2015-12-01 00:01:00  1                 NaT NaN
1 2015-12-01 00:02:00  1 2015-12-01 00:02:00   2
2 2015-12-01 00:02:00  1                 NaT NaN
3 2015-12-01 00:12:00  1 2015-12-01 00:05:00   2
4 2015-12-01 00:16:00  1 2015-12-01 00:14:00   2
4 2015-12-01 00:16:00  1 2015-12-01 00:14:00   2
5 2015-12-01 00:28:00  1 2015-12-01 00:22:00   2
6 2015-12-01 00:30:00  1                 NaT NaN
7 2015-12-01 00:39:00  1 2015-12-01 00:31:00   2
7 2015-12-01 00:39:00  1 2015-12-01 00:39:00   2
8 2015-12-01 00:55:00  1 2015-12-01 00:40:00   2
8 2015-12-01 00:55:00  1 2015-12-01 00:46:00   2
8 2015-12-01 00:55:00  1 2015-12-01 00:54:00   2
9 2015-12-01 00:57:00  1                 NaT NaN

这篇关于如何根据最近的(或最近的)时间戳合并两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆