大 pandas :删除另一个系列的时间索引(即排除时间范围)的时间间隔内的所有行 [英] pandas: Remove all rows within time interval of another series's time index (i.e. time range exclusion)

查看:84
本文介绍了大 pandas :删除另一个系列的时间索引(即排除时间范围)的时间间隔内的所有行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个数据框:

#df1
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:03.233    1.0
2016-09-12 13:00:10.256    1.0
2016-09-12 13:00:19.605    1.0

#df2
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:00.233    0.0
2016-09-12 13:00:01.016    1.0
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0
2016-09-12 13:00:19.705    0.0

我想删除df2中所有长达df1时间索引+1秒的行,所以产生:

I want to remove all rows in df2 that are up to +1 second of the time indices in df1, so yielding:

#result
time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

最有效的方法是什么?对于API中的时间范围排除,我看不到任何有用的东西.

What's the most efficient way to do this? I don't see anything useful for time range exclusions in the API.

推荐答案

您可以使用

You can use pd.merge_asof which is a new inclusion starting with 0.19.0 and also accepts a tolerance argument to match +/- that specified amount of time interval.

# Assuming time to be set as the index axis for both df's
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

df2.loc[pd.merge_asof(df2, df1, on='time', tolerance=pd.Timedelta('1s')).isnull().any(1)]

请注意,默认匹配是在向后方向上进行的,这意味着选择在其"on"键(为)小于或等于左键(df2).因此,tolerance参数仅在此方向上(向后)延伸,从而导致-匹配范围.

Note that default matching is carried out in the backwards direction, which means that selection occurs at the last row in the right DataFrame (df1) whose "on" key (which is "time") is less than or equal to the left's (df2) key. Hence, the tolerance parameter extends only in this direction (backward) resulting in a - range of matching.

To have both forward as well as backward lookups possible, starting with 0.20.0 this can be achieved by making use of direction='nearest' argument and including it in the function call. Due to this, the tolerance also gets extended both ways resulting in a +/- bandwidth range of matching.

这篇关于大 pandas :删除另一个系列的时间索引(即排除时间范围)的时间间隔内的所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆