通过相邻行的差异过滤 pandas 数据框 [英] Filtering pandas dataframe by difference of adjacent rows

查看:66
本文介绍了通过相邻行的差异过滤 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个按日期时间索引的数据框。我想根据行索引与上一行索引之间的差异来过滤出行。

I have a dataframe indexed by datetime. I want to filter out rows based on the difference between their index and the index of the previous row.

因此,如果我的标准是删除所有超过一个的行比上一行晚一小时,则应删除以下示例中的第二行:

So, if my criteria is "remove all rows that are over one hour late than the previous row", the second row in the example below should be removed:

2005-07-15 17:00:00  
2005-07-17 18:00:00  

以下情况下,两行都保持不变:

While in the following case, both rows stay:

2005-07-17 23:00:00  
2005-07-18 00:00:00 


推荐答案

似乎您需要 布尔索引 diff 的a> code> 进行比较,并与 1小时Timedelta 进行比较:

It seems you need boolean indexing with diff for difference and compare with 1 hour Timedelta:

dates=['2005-07-15 17:00:00','2005-07-17 18:00:00', '2005-07-17 19:00:00',  
      '2005-07-17 23:00:00', '2005-07-18 00:00:00']
df = pd.DataFrame({'a':range(5)}, index=pd.to_datetime(dates))

print (df)
                     a
2005-07-15 17:00:00  0
2005-07-17 18:00:00  1
2005-07-17 19:00:00  2
2005-07-17 23:00:00  3
2005-07-18 00:00:00  4







diff = df.index.to_series().diff().fillna(0)
print (diff)
2005-07-15 17:00:00   0 days 00:00:00
2005-07-17 18:00:00   2 days 01:00:00
2005-07-17 19:00:00   0 days 01:00:00
2005-07-17 23:00:00   0 days 04:00:00
2005-07-18 00:00:00   0 days 01:00:00
dtype: timedelta64[ns]

mask = diff <= pd.Timedelta(1, unit='h')
print (mask)
2005-07-15 17:00:00     True
2005-07-17 18:00:00    False
2005-07-17 19:00:00     True
2005-07-17 23:00:00    False
2005-07-18 00:00:00     True
dtype: bool

df = df[mask]
print (df)
                     a
2005-07-15 17:00:00  0
2005-07-17 19:00:00  2
2005-07-18 00:00:00  4

这篇关于通过相邻行的差异过滤 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆