如何基于Pandas数据框中的两个或多个子集条件删除重复项 [英] How to drop duplicates based on two or more subsets criteria in Pandas data-frame

查看：90 发布时间：2020/5/23 22:04:16 python pandas dataframe pandas-groupby

本文介绍了如何基于Pandas数据框中的两个或多个子集条件删除重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们说这是我的数据框

Lets say this is my data-frame

df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'],
                'center' : ['one', 'one', 'two', 'three'],
                'outcome' : ['f','t','f','f'] })

看起来像这样...

  bio center outcome
0   1    one       f
1   1    one       t
2   1    two       f
3   4  three       f

我要删除第1行，因为它具有相同的生物&居中作为第0行. 我想保留第2行，因为它具有相同的生物但中心与第0行不同.

I want to drop row 1 because it has the same bio & center as row 0. I want to keep row 2 because it has the same bio but different center then row 0.

基于drop_duplicates输入结构，类似的操作将无法正常工作，但这是我正在尝试的操作

Something like this won't work based on drop_duplicates input structure but it's what I am trying to do

df.drop_duplicates(subset = 'bio' & subset = 'center' )

有什么建议吗?

edit:对df进行了一些更改，以使其符合正确答案的示例

edit : changed df a bit to fit example by correct answer

推荐答案

您的语法错误.这是正确的方法:

Your syntax is wrong. Here's the correct way:

df.drop_duplicates(subset=['bio', 'center', 'outcome'])

或者在这种情况下，只需:

Or in this specific case, just simply:

df.drop_duplicates()

两者都返回以下内容:

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

看看df.drop_duplicates 文档以获得语法详细信息. subset应该是列标签的序列.

Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column labels.

这篇关于如何基于Pandas数据框中的两个或多个子集条件删除重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何基于Pandas数据框中的两个或多个子集条件删除重复项 [英] How to drop duplicates based on two or more subsets criteria in Pandas data-frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何基于Pandas数据框中的两个或多个子集条件删除重复项 [英] How to drop duplicates based on two or more subsets criteria in Pandas data-frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭