如何标记 pandas DataFrame中的最后一个重复元素 [英] How to flag last duplicate element in a pandas DataFrame
本文介绍了如何标记 pandas DataFrame中的最后一个重复元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如您所知,有一种方法.duplicated
可以在列中查找重复项,但是我需要的是最后一个重复的元素,知道我的数据是按日期排序的.
As you know there is the method .duplicated
to find duplicates in a column but what I need is the last duplicated element knowing that my data is ordered by Date.
这是列Policy_id
的预期结果Last_dup
:
Id Policy_id Start_Date Last_dup
0 b123 2019/02/24 0
1 b123 2019/03/24 0
2 b123 2019/04/24 1
3 c123 2018/09/01 0
4 c123 2018/10/01 1
5 d123 2017/02/24 0
6 d123 2017/03/24 1
在此先感谢您的帮助和支持!
Thanks in advance for your help and support!
推荐答案
使用 DataFrame.duplicated
具有指定列和参数keep='last'
,然后将反掩码转换为True/False
到1/0
映射的整数或使用
Use Series.duplicated
or DataFrame.duplicated
with specify column and parameter keep='last'
and then convert inverted mask to integer for True/False
to 1/0
mapping or use numpy.where
:
df['Last_dup1'] = (~df['Policy_id'].duplicated(keep='last')).astype(int)
df['Last_dup1'] = np.where(df['Policy_id'].duplicated(keep='last'), 0, 1)
或者:
df['Last_dup1'] = (~df.duplicated(subset=['Policy_id'], keep='last')).astype(int)
df['Last_dup1'] = np.where(df.duplicated(subset=['Policy_id'], keep='last'), 0, 1)
print (df)
Id Policy_id Start_Date Last_dup Last_dup1
0 0 b123 2019/02/24 0 0
1 1 b123 2019/03/24 0 0
2 2 b123 2019/04/24 1 1
3 3 c123 2018/09/01 0 0
4 4 c123 2018/10/01 1 1
5 5 d123 2017/02/24 0 0
6 6 d123 2017/03/24 1 1
这篇关于如何标记 pandas DataFrame中的最后一个重复元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文