如何根据条件/分组从另一列中删除一列中的连续重复行? [英] How can I drop consecutive duplicate rows in one column based on condition/grouping from another column?

查看：78 发布时间：2020/8/1 19:52:04 python python-3.x pandas duplicates

本文介绍了如何根据条件/分组从另一列中删除一列中的连续重复行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有大型数据框(约1万行)，前几行看起来就像我称之为df_a:

I have large dataframe (approx. 10k rows) with the first few rows looking like what I'll call df_a:

logtime             | zone  | value   
01/01/2017 06:05:00 | 0     | 14.5
01/01/2017 06:05:00 | 1     | 14.5
01/01/2017 06:05:00 | 2     | 17.0
01/01/2017 06:25:00 | 0     | 14.5
01/01/2017 06:25:00 | 1     | 14.5
01/01/2017 06:25:00 | 2     | 10.0
01/01/2017 06:50:00 | 0     | 10.0
01/01/2017 06:50:00 | 1     | 10.0
01/01/2017 06:50:00 | 2     | 10.0
01/01/2017 07:50:00 | 0     | 14.5
01/01/2017 07:50:00 | 1     | 14.5
01/01/2017 07:50:00 | 2     | 14.5
etc.

我希望删除连续重复项，以便只留下有关区域如何变化的信息.例如，如果区域1在两个日志时间内位于14.5，则重复项将被删除，直到更改为10.0.这样我就得到了一个像这样的数据框:

I am looking to drop consecutive duplicates, so that I am only left with information about how zones change. For example, if zone 1 is at 14.5 over two logtimes, the duplicate is removed until it changes to 10.0. So that I'm left with a dataframe like:

logtime             | zone  | value   
01/01/2017 06:05:00 | 0     | 14.5
01/01/2017 06:05:00 | 1     | 14.5
01/01/2017 06:05:00 | 2     | 17.0
01/01/2017 06:25:00 | 2     | 10.0
01/01/2017 06:50:00 | 0     | 10.0
01/01/2017 06:50:00 | 1     | 10.0
01/01/2017 07:50:00 | 0     | 14.5
01/01/2017 07:50:00 | 1     | 14.5
01/01/2017 07:50:00 | 2     | 14.5
etc.

我的理解是drop_duplicates将仅保留唯一值，因此这对我的目标不起作用.

My understanding is that drop_duplicates will only retain unique values, so this doesn't work for my aim.

我还尝试使用.loc和shift方法:

I also tried to use a .loc and shift method:

removeduplicates = df.loc[ (df.logtime != df.logtime.shift(1)) | (df.zone != df.zone.shift(1)) | (df.value != df.value.shift(1))]

但是，这不会失败也不起作用，无法获得所需的输出.谢谢！

However, this doesn't fail nor does it work to get the desired output. Thanks!

推荐答案

您可以创建一个布尔掩码，其中每组区域的连续值之间的差异不等于0:

you can create a Boolean mask where the diff between successive values per group of zone is not equal to 0:

print (df[df.groupby(['zone']).value.diff().ne(0)])
                logtime  zone  value
0  01/01/2017 06:05:00      0   14.5
1  01/01/2017 06:05:00      1   14.5
2  01/01/2017 06:05:00      2   17.0
5  01/01/2017 06:25:00      2   10.0
6  01/01/2017 06:50:00      0   10.0
7  01/01/2017 06:50:00      1   10.0

这篇关于如何根据条件/分组从另一列中删除一列中的连续重复行?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何根据条件/分组从另一列中删除一列中的连续重复行? [英] How can I drop consecutive duplicate rows in one column based on condition/grouping from another column?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何根据条件/分组从另一列中删除一列中的连续重复行? [英] How can I drop consecutive duplicate rows in one column based on condition/grouping from another column?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭