如何计算 pandas 数据框中的重复行? [英] How to count duplicate rows in pandas dataframe?
问题描述
我正在尝试计算数据框中每种类型的行的重复项.例如,假设我在熊猫中有一个数据框,如下所示:
I am trying to count the duplicates of each type of row in my dataframe. For example, say that I have a dataframe in pandas as follows:
df = pd.DataFrame({'one': pd.Series([1., 1, 1]),
'two': pd.Series([1., 2., 1])})
我得到一个看起来像这样的df:
I get a df that looks like this:
one two
0 1 1
1 1 2
2 1 1
我想第一步是找到所有不同的唯一行,我这样做是:
I imagine the first step is to find all the different unique rows, which I do by:
df.drop_duplicates()
这给了我以下df:
one two
0 1 1
1 1 2
现在,我想从上面的df([1 1]和[1 2])中获取每一行,并计算出初始df中每行的次数.我的结果看起来像这样:
Now I want to take each row from the above df ([1 1] and [1 2]) and get a count of how many times each is in the initial df. My result would look something like this:
Row Count
[1 1] 2
[1 2] 1
我应该怎么做最后一步?
How should I go about doing this last step?
这里有一个更大的例子,可以使它更清楚:
Here's a larger example to make it more clear:
df = pd.DataFrame({'one': pd.Series([True, True, True, False]),
'two': pd.Series([True, False, False, True]),
'three': pd.Series([True, False, False, False])})
给我:
one three two
0 True True True
1 True False False
2 True False False
3 False False True
我想要一个告诉我的结果:
I want a result that tells me:
Row Count
[True True True] 1
[True False False] 2
[False False True] 1
推荐答案
您可以在所有列上groupby
并调用size
,索引指示重复的值:
You can groupby
on all the columns and call size
the index indicates the duplicate values:
In [28]:
df.groupby(df.columns.tolist(),as_index=False).size()
Out[28]:
one three two
False False True 1
True False False 2
True True 1
dtype: int64
这篇关于如何计算 pandas 数据框中的重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!