pandas :根据条件删除一些重复值 [英] Pandas : remove SOME duplicate values based on conditions

查看：88 发布时间：2021/5/3 18:54:32 python pandas duplicates

本文介绍了 pandas :根据条件删除一些重复值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据集:

id    url     keep_if_dup
1     A.com   Yes
2     A.com   Yes
3     B.com   No
4     B.com   No
5     C.com   No

我要删除重复项，即保留"url"字段的第一次出现，如果字段"keep_if_dup"为是"，则但是保留重复项.

I want to remove duplicates, i.e. keep first occurence of "url" field, BUT keep duplicates if the field "keep_if_dup" is YES.

预期输出:

id    url     keep_if_dup
1     A.com   Yes
2     A.com   Yes
3     B.com   No
5     C.com   No

我尝试过的事情:

Dataframe=Dataframe.drop_duplicates(subset='url', keep='first')

哪个当然不考虑"keep_if_dup"字段.输出为:

which of course does not take into account "keep_if_dup" field. Output is :

id    url     keep_if_dup
1     A.com   Yes
3     B.com   No
5     C.com   No

推荐答案

您可以将多个布尔条件传递给 loc ，第一个条件将所有行保留在col'keep_if_dup'=='Yes'，(使用 | )进行了或的(使用 | )的布尔布尔掩码，用于确定是否复制了col'url'列:


You can pass multiple boolean conditions to loc, the first keeps all rows where col 'keep_if_dup' == 'Yes', this is ored (using |) with the inverted boolean mask of whether col 'url' column is duplicated or not:
In [79]:
df.loc[(df['keep_if_dup'] =='Yes') | ~df['url'].duplicated()]

Out[79]:
   id    url keep_if_dup
0   1  A.com         Yes
1   2  A.com         Yes
2   3  B.com          No
4   5  C.com          No

覆盖您的df自分配:
df = df.loc[(df['keep_if_dup'] =='Yes') | ~df['url'].duplicated()]

分解上面的内容会显示两个布尔掩码:
breaking down the above shows the 2 boolean masks:
In [80]:
~df['url'].duplicated()

Out[80]:
0     True
1    False
2     True
3    False
4     True
Name: url, dtype: bool

In [81]:
df['keep_if_dup'] =='Yes'

Out[81]:
0     True
1     True
2    False
3    False
4    False
Name: keep_if_dup, dtype: bool


                        这篇关于 pandas :根据条件删除一些重复值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

pandas :根据条件删除一些重复值 [英] Pandas : remove SOME duplicate values based on conditions

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :根据条件删除一些重复值 [英] Pandas : remove SOME duplicate values based on conditions

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭