pandas DF中的重复行 [英] Duplicate rows in pandas DF
问题描述
我在Pandas中有一个DF,它看起来像:
I have a DF in Pandas, which looks like:
Letters Numbers
A 1
A 3
A 2
A 1
B 1
B 2
B 3
C 2
C 2
我想计算类似行的数量,并将结果保存在第三列。例如,我正在寻找的输出:
I'm looking to count the number of similar rows and save the result in a third column. For example, the output I'm looking for:
Letters Numbers Events
A 1 2
A 2 1
A 3 1
B 1 1
B 2 1
B 3 1
C 2 2
我想要做的一个例子是这里。我想出的最好的想法是使用 count_values()
,但我认为这只是一列。另一个想法是使用 duplicateated()
,反正我不想构造任何为
-loop。我很确定,一个for循环的Pythonic替代存在。
An example of what I'm looking to do is here. The best idea I've come up with is to use count_values()
, but I think this is just for one column. Another idea is to use duplicated()
, anyway I don't want construct any for
-loop. I'm pretty sure, that a Pythonic alternative to a for loop exists.
推荐答案
你可以分组这两列,组的大小:
You can groupby these two columns and then calculate the sizes of the groups:
In [16]: df.groupby(['Letters', 'Numbers']).size()
Out[16]:
Letters Numbers
A 1 2
2 1
3 1
B 1 1
2 1
3 1
C 2 2
dtype: int64
得到一个像您的示例输出中的DataFrame,您可以使用 reset_index
重置索引。
To get a DataFrame like in your example output, you can reset the index with reset_index
.
这篇关于pandas DF中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!