如果时间戳接近但不相同，则在DataFrame中删除重复项 [英] Drop Duplicates in a DataFrame if Timestamps are Close, but not Identical

查看：33 发布时间：2021/5/3 18:55:32 python pandas dataframe duplicates

本文介绍了如果时间戳接近但不相同，则在DataFrame中删除重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

想象一下，我有以下DataFrame

Imagine that I've got the following DataFrame

            A        | B | C | D
 -------------------------------
 2000-01-01 00:00:00 | 1 | 1 | 1
 2000-01-01 00:04:30 | 1 | 2 | 2
 2000-01-01 00:04:30 | 2 | 3 | 3
 2000-01-02 00:00:00 | 1 | 4 | 4

我想删除 B 相等的行，并且 A 中的值是"close".说，彼此相隔五分钟.因此，在这种情况下，前两行要保留最后两行.

And I want to drop rows where B are equal, and the values in A are "close". Say, withing five minutes of each other. So in this case the first two rows, but keep the last two.

因此，我不希望执行 df.dropna(subset = ['A'，'B']，inplace = True，keep = False)，而是要更类似于 df.dropna(subset = ['A'，'B']，inplace = True，keep = False，func = {'A':some_func}).与

So, instead of doing df.dropna(subset=['A', 'B'], inplace=True, keep=False), I'd like something that's more like df.dropna(subset=['A', 'B'], inplace=True, keep=False, func={'A': some_func}). With

def some_func(ts1, ts2):
    delta = ts1 - ts2
    return abs(delta.total_seconds()) >= 5 * 60

在熊猫市中有办法吗?

推荐答案

m = df.groupby('B').A.apply(lambda x: x.diff().dt.seconds < 300)
m2 = df.B.duplicated(keep=False) & (m | m.shift(-1))
df[~m2]
                    A  B  C  D
2 2000-01-01 00:04:30  2  3  3
3 2000-01-02 00:00:00  1  4  4

详细信息

m 会在彼此之间的5分钟之内得到所有行的掩码.

m gets a mask of all rows within 5 minutes of each other.

m

0    False
1     True
2    False
3    False
Name: A, dtype: bool

m2 是必须删除的所有项目的最终掩码.

m2 is the final mask of all items that must be dropped.

m2

0     True
1     True
2    False
3    False
dtype: bool

这篇关于如果时间戳接近但不相同，则在DataFrame中删除重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如果时间戳接近但不相同，则在DataFrame中删除重复项 [英] Drop Duplicates in a DataFrame if Timestamps are Close, but not Identical

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如果时间戳接近但不相同，则在DataFrame中删除重复项 [英] Drop Duplicates in a DataFrame if Timestamps are Close, but not Identical

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭