pandas :删除连续的重复项 [英] Pandas: Drop consecutive duplicates

查看：100 发布时间：2020/5/23 21:14:18 python pandas

本文介绍了 pandas :删除连续的重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在熊猫中仅丢弃连续重复项的最有效方法是什么?

What's the most efficient way to drop only consecutive duplicates in pandas?

drop_duplicates给出了这一点:

drop_duplicates gives this:

In [3]: a = pandas.Series([1,2,2,3,2], index=[1,2,3,4,5])

In [4]: a.drop_duplicates()
Out[4]: 
1    1
2    2
4    3
dtype: int64

但是我想要这个:

In [4]: a.something()
Out[4]: 
1    1
2    2
4    3
5    2
dtype: int64

推荐答案

使用 shift :

a.loc[a.shift(-1) != a]

Out[3]:

1    1
3    2
4    3
5    2
dtype: int64

因此，以上代码使用布尔条件，我们将数据框与移位-1行的数据框进行比较，以创建掩码

So the above uses boolean critieria, we compare the dataframe against the dataframe shifted by -1 rows to create the mask

另一种方法是使用 diff :

Another method is to use diff:

In [82]:

a.loc[a.diff() != 0]
Out[82]:
1    1
2    2
4    3
5    2
dtype: int64

但是如果您有很多行，这比原始方法要慢.

But this is slower than the original method if you have a large number of rows.

更新

感谢Bjarke Ebert指出一个细微的错误，我实际上应该使用shift(1)或只是shift()，因为默认值为1，这将返回第一个连续的值:

Thanks to Bjarke Ebert for pointing out a subtle error, I should actually use shift(1) or just shift() as the default is a period of 1, this returns the first consecutive value:

In [87]:

a.loc[a.shift() != a]
Out[87]:
1    1
2    2
4    3
5    2
dtype: int64

请注意索引值的不同，谢谢@BjarkeEbert！

Note the difference in index values, thanks @BjarkeEbert!

这篇关于 pandas :删除连续的重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas :删除连续的重复项 [英] Pandas: Drop consecutive duplicates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :删除连续的重复项 [英] Pandas: Drop consecutive duplicates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭