从 pandas 中删除非重复行 [英] Remove non-duplicated rows from pandas

查看：84 发布时间：2021/6/13 20:43:05 python pandas

本文介绍了从 pandas 中删除非重复行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这很简单，但我无法理解.假设对于以下数据框，我只想保留 y 列中具有重复值的行:

<预><代码>>>>dfxyxy0 1 11 2 22 3 23 4 34 5 35 6 36 7 57 8 2

所需的输出如下:

<预><代码>>>>dfxy1 2 22 3 23 4 34 5 35 6 37 8 2

我试过了:

df[~df.duplicated('y')]

但我明白了:

 x y0 1 11 2 23 4 36 7 5

解决方案

文档:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

<块引用>

保持:{'first','last', False}，默认'first'

第一:马克除第一次出现外，重复为 True.
最后:马克除最后一次出现外，重复为 True.
False : 全部标记重复为 True.

表示您正在寻找:

df[df.duplicated('y',keep=False)]

输出:

 x y1 2 22 3 23 4 34 5 35 6 37 8 2

This is rather simple but I can't get me head around it. Let's say for the following data frame, I want to keep only the rows with duplicated values in column y:

The desired output looks like:

I tried this:

df[~df.duplicated('y')]

but I get this:

解决方案

Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

keep : {‘first’, ‘last’, False}, default ‘first’

first : Mark duplicates as True except for the first occurrence.

last : Mark duplicates as True except for the last occurrence.

False : Mark all duplicates as True.

Meaning you are looking for:

df[df.duplicated('y',keep=False)]

Output:

这篇关于从 pandas 中删除非重复行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 pandas 中删除非重复行 [英] Remove non-duplicated rows from pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从 pandas 中删除非重复行 [英] Remove non-duplicated rows from pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭