获取csv中两列内的出现次数 [英] Get count of occurrences inside two columns inside a csv
问题描述
你好,我在csv中有以下数据集:
Hello I have the following set of data in csv:
Group Size Some_other_column1 Some_other_column2
Short Small blabla1 blabla6
Moderate Medium babla3 blabla8
Short Small blabla2 blabla7
Moderate Small blabla4 blabla9
Tall Large blabla5 blabla10
Short Medium blabla11 blabla12
我想使用python代码获得以下结果:
I would like to get the following result using python code:
Group Size Count Some_other_column1 Some_other_column2
Short Small 2 blabla1 blabla6
Moderate Medium 1 babla3 blabla8
Short Small 2 blabla2 blabla7
Moderate Small 1 blabla4 blabla9
Tall Large 1 blabla5 blabla10
Short Medium 1 blabla11 blabla12
基本上,我需要计算组大小对的数目,并为此创建一个新列,即"Count",其他所有列保持相同. 我可以使用熊猫或任何可以帮助您的东西.
Basically I need to count the number of group-size pairs and create a new column for that called, let's say, "Count", keeping all the other columns the same. I can use pandas or anything that can help.
作为参考,关于此主题还有另一个问题,但是由于我需要保留多个列,因此它不能解决我的问题: Python:获取频率计数基于熊猫数据框中的两列(变量)
For reference, there was another question asked on this topic, but it does not solve my problem since I have multiple columns that I need to keep: Python: get a frequency count based on two columns (variables) in pandas dataframe
这里还有另一个主题: 如何为size()列分配名称? 但这也不能回答我的问题,因为我不想再通过应用上述链接中描述的方法而间接删除另外2列(其他1/2列").同样,同样重要的是,我不想合并对,我需要保留所有对,因为它们在Some_other_column1/2上具有不同的值.
There is another topic here: How to assign a name to the a size() column? But this is also not answering my question because I have 2 more columns ("some other column1/2") that I do not want to indirectly drop by applying the method described at the above link. Also, what is equally important, I do not want to merge pairs, I need to keep all of them, because they have different values on Some_other_column1/2.
推荐答案
您需要 insert
与size的rel ="nofollow noreferrer"> GroupBy.transform
:
You need insert
with GroupBy.transform
of size
:
df.insert(2, 'Count', df.groupby(['Group','Size'])['Size'].transform('size'))
print (df)
Group Size Count Some_other_column1 Some_other_column2
0 Short Small 2 blabla1 blabla6
1 Moderate Medium 1 babla3 blabla8
2 Short Small 2 blabla2 blabla7
3 Moderate Small 1 blabla4 blabla9
4 Tall Large 1 blabla5 blabla10
5 Short Medium 1 blabla11 blabla12
这篇关于获取csv中两列内的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!