如何根据条件/分组从另一列中删除一列中的连续重复行? [英] How can I drop consecutive duplicate rows in one column based on condition/grouping from another column?
问题描述
我有大型数据框(约1万行),前几行看起来就像我称之为df_a:
I have large dataframe (approx. 10k rows) with the first few rows looking like what I'll call df_a:
logtime | zone | value
01/01/2017 06:05:00 | 0 | 14.5
01/01/2017 06:05:00 | 1 | 14.5
01/01/2017 06:05:00 | 2 | 17.0
01/01/2017 06:25:00 | 0 | 14.5
01/01/2017 06:25:00 | 1 | 14.5
01/01/2017 06:25:00 | 2 | 10.0
01/01/2017 06:50:00 | 0 | 10.0
01/01/2017 06:50:00 | 1 | 10.0
01/01/2017 06:50:00 | 2 | 10.0
01/01/2017 07:50:00 | 0 | 14.5
01/01/2017 07:50:00 | 1 | 14.5
01/01/2017 07:50:00 | 2 | 14.5
etc.
我希望删除连续重复项,以便只留下有关区域如何变化的信息.例如,如果区域1在两个日志时间内位于14.5,则重复项将被删除,直到更改为10.0.这样我就得到了一个像这样的数据框:
I am looking to drop consecutive duplicates, so that I am only left with information about how zones change. For example, if zone 1 is at 14.5 over two logtimes, the duplicate is removed until it changes to 10.0. So that I'm left with a dataframe like:
logtime | zone | value
01/01/2017 06:05:00 | 0 | 14.5
01/01/2017 06:05:00 | 1 | 14.5
01/01/2017 06:05:00 | 2 | 17.0
01/01/2017 06:25:00 | 2 | 10.0
01/01/2017 06:50:00 | 0 | 10.0
01/01/2017 06:50:00 | 1 | 10.0
01/01/2017 07:50:00 | 0 | 14.5
01/01/2017 07:50:00 | 1 | 14.5
01/01/2017 07:50:00 | 2 | 14.5
etc.
我的理解是drop_duplicates
将仅保留唯一值,因此这对我的目标不起作用.
My understanding is that drop_duplicates
will only retain unique values, so this doesn't work for my aim.
我还尝试使用.loc和shift方法:
I also tried to use a .loc and shift method:
removeduplicates = df.loc[ (df.logtime != df.logtime.shift(1)) | (df.zone != df.zone.shift(1)) | (df.value != df.value.shift(1))]
但是,这不会失败也不起作用,无法获得所需的输出.谢谢!
However, this doesn't fail nor does it work to get the desired output. Thanks!
推荐答案
您可以创建一个布尔掩码,其中每组区域的连续值之间的差异不等于0:
you can create a Boolean mask where the diff between successive values per group of zone is not equal to 0:
print (df[df.groupby(['zone']).value.diff().ne(0)])
logtime zone value
0 01/01/2017 06:05:00 0 14.5
1 01/01/2017 06:05:00 1 14.5
2 01/01/2017 06:05:00 2 17.0
5 01/01/2017 06:25:00 2 10.0
6 01/01/2017 06:50:00 0 10.0
7 01/01/2017 06:50:00 1 10.0
这篇关于如何根据条件/分组从另一列中删除一列中的连续重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!