如何标记 pandas DataFrame中的最后一个重复元素 [英] How to flag last duplicate element in a pandas DataFrame

查看：74 发布时间：2020/5/24 3:20:36 python pandas

本文介绍了如何标记 pandas DataFrame中的最后一个重复元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如您所知，有一种方法.duplicated可以在列中查找重复项，但是我需要的是最后一个重复的元素，知道我的数据是按日期排序的.

As you know there is the method .duplicated to find duplicates in a column but what I need is the last duplicated element knowing that my data is ordered by Date.

这是列Policy_id的预期结果Last_dup:

Id  Policy_id   Start_Date  Last_dup
0   b123        2019/02/24  0
1   b123        2019/03/24  0
2   b123        2019/04/24  1
3   c123        2018/09/01  0
4   c123        2018/10/01  1
5   d123        2017/02/24  0
6   d123        2017/03/24  1

在此先感谢您的帮助和支持！

Thanks in advance for your help and support!

推荐答案

使用 DataFrame.duplicated 具有指定列和参数keep='last'，然后将反掩码转换为True/False到1/0映射的整数或使用

Use Series.duplicated or DataFrame.duplicated with specify column and parameter keep='last' and then convert inverted mask to integer for True/False to 1/0 mapping or use numpy.where:

df['Last_dup1'] = (~df['Policy_id'].duplicated(keep='last')).astype(int)
df['Last_dup1'] = np.where(df['Policy_id'].duplicated(keep='last'), 0, 1)

或者:

df['Last_dup1'] = (~df.duplicated(subset=['Policy_id'], keep='last')).astype(int)
df['Last_dup1'] = np.where(df.duplicated(subset=['Policy_id'], keep='last'), 0, 1)

print (df)
   Id Policy_id  Start_Date  Last_dup  Last_dup1
0   0      b123  2019/02/24         0          0
1   1      b123  2019/03/24         0          0
2   2      b123  2019/04/24         1          1
3   3      c123  2018/09/01         0          0
4   4      c123  2018/10/01         1          1
5   5      d123  2017/02/24         0          0
6   6      d123  2017/03/24         1          1

这篇关于如何标记 pandas DataFrame中的最后一个重复元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何标记 pandas DataFrame中的最后一个重复元素 [英] How to flag last duplicate element in a pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何标记 pandas DataFrame中的最后一个重复元素 [英] How to flag last duplicate element in a pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭