如何在 pandas 数据框中查找重复项 [英] How to find duplicates in pandas dataframe

查看：144 发布时间：2020/8/1 20:09:30 python pandas dataframe duplicates

本文介绍了如何在 pandas 数据框中查找重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

编辑.

假设我的熊猫系列如下:

Suppose I have the following series in pandas:

我需要确定连续重复的每个序列-它的第一个和最后一个索引.使用上面的示例，我需要独立于最后一个0.3序列(从索引13到15)来标识第一个0.3序列(从索引3到7).

I need to identify each sequence of consecutive duplicates - its first and last index. Using the above example, I need to identify the first sequence of 0.3 (from index 3 to 7) independently from the last sequence of 0.3 (from index 13 to 15).

使用Series.duplicated是不够的，因为:

Using Series.duplicated is insufficient because:

* using keep ='first'将所有重复项的所有第一个实例标记为False，但由于它不是0.3的第一个出现，因此会将索引13保留为True.

*using keep='first' marks all first instances of duplicates False, but will leave index 13 as True because it is not the first appearance of 0.3.

* keep ='last'

*Same goes for keep='last'

* keep = False只是将所有条目标记为True.

*keep=False just marks all of the entries as True.

谢谢！

推荐答案

我相信需要进行比较 drop_duplicates :

I believe need trick with compare shifted values for not equal by ne with cumsum and last drop_duplicates:

s = df['a'].ne(df['a'].shift()).cumsum()
a = s.drop_duplicates().index
b = s.drop_duplicates(keep='last').index

df = pd.DataFrame({'first':a, 'last':b})
print (df)
   first  last
0      0     2
1      3     7
2      8    10
3     11    12
4     13    15

如果还希望将值复制到新列，请使用

If want also duplicated value to new column a bit change solution with duplicated:

s = df['a'].ne(df['a'].shift()).cumsum()
a = df.loc[~s.duplicated(), 'a']
b = s.drop_duplicates(keep='last')

df = pd.DataFrame({'first':a.index, 'last':b.index, 'val':a})
print (df)
    first  last  val
0       0     2  0.0
3       3     7  0.3
8       8    10  1.0
11     11    12  0.2
13     13    15  0.3

如果需要新列:

If need new column:

df['count'] = df['a'].ne(df['a'].shift()).cumsum()
print (df)
      a  count
0   0.0      1
1   0.0      1
2   0.0      1
3   0.3      2
4   0.3      2
5   0.3      2
6   0.3      2
7   0.3      2
8   1.0      3
9   1.0      3
10  1.0      3
11  0.2      4
12  0.2      4
13  0.3      5
14  0.3      5
15  0.3      5

这篇关于如何在 pandas 数据框中查找重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 pandas 数据框中查找重复项 [英] How to find duplicates in pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 pandas 数据框中查找重复项 [英] How to find duplicates in pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭