如何删除 pandas 数据框中具有重复列值的行? [英] how do I remove rows with duplicate values of columns in pandas data frame?

查看:88
本文介绍了如何删除 pandas 数据框中具有重复列值的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的熊猫数据框.

I have a pandas data frame which looks like this.

'Column1' 'Column2' 'Column3'
'cat'     'bat'.    'xyz'
'toy'    'flower'.  'abc'
'cat'    'bat'      'lmn'

我想确定cat和bat是重复的相同值,因此想删除一个记录并仅保留第一个记录.结果数据帧应该只有一个.

I want to identify that cat and bat are same values which have been repeated and hence want to remove one record and preserve only the first record. The resulting data frame should only have.

'Column1'  'Column2' 'Column3'
'cat'.     'bat'.     'xyz'
'toy'.     'flower'.  'abc'   

推荐答案

drop_duplicatessubset一起使用,并在列列表中检查重复项,并在keep='first'上保留重复项.

Using drop_duplicates with subset with list of columns to check for duplicates on and keep='first' to keep first of duplicates.

如果dataframe是:

df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"],
                   'Column2': ["'bat'", "'flower'", "'bat'"],
                   'Column3': ["'xyz'", "'abc'", "'lmn'"]})
print(df)

结果:

  Column1   Column2 Column3
0   'cat'     'bat'   'xyz'
1   'toy'  'flower'   'abc'
2   'cat'     'bat'   'lmn'

然后:

result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first')
print(result_df)

结果:

  Column1   Column2 Column3
0   'cat'     'bat'   'xyz'
1   'toy'  'flower'   'abc'

这篇关于如何删除 pandas 数据框中具有重复列值的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆