从 pandas 中删除非重复行 [英] Remove non-duplicated rows from pandas
问题描述
这很简单,但我无法理解.假设对于以下数据框,我只想保留 y 列中具有重复值的行:
<预><代码>>>>dfxyxy0 1 11 2 22 3 23 4 34 5 35 6 36 7 57 8 2所需的输出如下:
<预><代码>>>>dfxy1 2 22 3 23 4 34 5 35 6 37 8 2我试过了:
df[~df.duplicated('y')]
但我明白了:
x y0 1 11 2 23 4 36 7 5
文档:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html
<块引用>保持:{'first','last', False},默认'first'
第一:马克除第一次出现外,重复为 True.
最后:马克除最后一次出现外,重复为 True.
False : 全部标记重复为 True.
表示您正在寻找:
df[df.duplicated('y',keep=False)]
输出:
x y1 2 22 3 23 4 34 5 35 6 37 8 2
This is rather simple but I can't get me head around it. Let's say for the following data frame, I want to keep only the rows with duplicated values in column y:
>>> df
x y
x y
0 1 1
1 2 2
2 3 2
3 4 3
4 5 3
5 6 3
6 7 5
7 8 2
The desired output looks like:
>>> df
x y
1 2 2
2 3 2
3 4 3
4 5 3
5 6 3
7 8 2
I tried this:
df[~df.duplicated('y')]
but I get this:
x y
0 1 1
1 2 2
3 4 3
6 7 5
Docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html
keep : {‘first’, ‘last’, False}, default ‘first’
first : Mark duplicates as True except for the first occurrence.
last : Mark duplicates as True except for the last occurrence.
False : Mark all duplicates as True.
Meaning you are looking for:
df[df.duplicated('y',keep=False)]
Output:
x y
1 2 2
2 3 2
3 4 3
4 5 3
5 6 3
7 8 2
这篇关于从 pandas 中删除非重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!