获取csv中两列内的出现次数 [英] Get count of occurrences inside two columns inside a csv

查看:125
本文介绍了获取csv中两列内的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我在csv中有以下数据集:

Hello I have the following set of data in csv:

Group           Size     Some_other_column1      Some_other_column2

Short          Small            blabla1                     blabla6    
Moderate       Medium           babla3                      blabla8
Short          Small            blabla2                     blabla7
Moderate       Small            blabla4                     blabla9
Tall           Large            blabla5                     blabla10
Short          Medium           blabla11                    blabla12

我想使用python代码获得以下结果:

I would like to get the following result using python code:

Group           Size      Count     Some_other_column1      Some_other_column2

Short          Small       2            blabla1                     blabla6
Moderate       Medium      1            babla3                      blabla8
Short          Small       2            blabla2                     blabla7
Moderate       Small       1            blabla4                     blabla9
Tall           Large       1            blabla5                     blabla10
Short          Medium      1            blabla11                    blabla12

基本上,我需要计算组大小对的数目,并为此创建一个新列,即"Count",其他所有列保持相同. 我可以使用熊猫或任何可以帮助您的东西.

Basically I need to count the number of group-size pairs and create a new column for that called, let's say, "Count", keeping all the other columns the same. I can use pandas or anything that can help.

作为参考,关于此主题还有另一个问题,但是由于我需要保留多个列,因此它不能解决我的问题: Python:获取频率计数基于熊猫数据框中的两列(变量)

For reference, there was another question asked on this topic, but it does not solve my problem since I have multiple columns that I need to keep: Python: get a frequency count based on two columns (variables) in pandas dataframe

这里还有另一个主题: 如何为size()列分配名称? 但这也不能回答我的问题,因为我不想再通过应用上述链接中描述的方法而间接删除另外2列(其他1/2列").同样,同样重要的是,我不想合并对,我需要保留所有对,因为它们在Some_other_column1/2上具有不同的值.

There is another topic here: How to assign a name to the a size() column? But this is also not answering my question because I have 2 more columns ("some other column1/2") that I do not want to indirectly drop by applying the method described at the above link. Also, what is equally important, I do not want to merge pairs, I need to keep all of them, because they have different values on Some_other_column1/2.

推荐答案

您需要 insert size的rel ="nofollow noreferrer"> GroupBy.transform :

You need insert with GroupBy.transform of size:

df.insert(2, 'Count', df.groupby(['Group','Size'])['Size'].transform('size'))
print (df)
      Group    Size  Count Some_other_column1 Some_other_column2
0     Short   Small      2            blabla1            blabla6
1  Moderate  Medium      1             babla3            blabla8
2     Short   Small      2            blabla2            blabla7
3  Moderate   Small      1            blabla4            blabla9
4      Tall   Large      1            blabla5           blabla10
5     Short  Medium      1           blabla11           blabla12

这篇关于获取csv中两列内的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆