数据框中的一对值出现的次数 [英] Number of occurrence of pair of value in dataframe

查看:55
本文介绍了数据框中的一对值出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框包含以下几列:

I have dataframe with following columns:

Name, Surname, dateOfBirth, city, country

我有兴趣找到最常见的姓名和姓氏组合以及它出现的次数. 还要查看排名前10位的组合也很高兴.

I am interested to find what is most common combination of name and surname and how much it occurs as well. Would be nice also to see list of top 10 combinations.

我对第一名的想法是:

mostFreqComb= df.groupby(['Name','Surname'])['Name'].count().argmax()

但是我认为这没有给我正确的答案. 帮助将不胜感激!

But I think it is not giving me correct answer. Help would be much appreciated !

谢谢, 笔尖

推荐答案

有关以下解决方案对性能的影响,请参见

For performance implications of the below solutions, see Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series. They are presented below with best performance first.

您可以使用GroupBy.size使用(名称,姓氏)元组索引创建一系列计数:

You can create a series of counts with (Name, Surname) tuple indices using GroupBy.size:

res = df.groupby(['Name', 'Surname']).size().sort_values(ascending=False)

通过对这些值进行排序,我们可以轻松提取出最常见的值:

By sorting these values, we can easily extract the most common:

most_common = res.head(1)
most_common_dups = res[res == res.iloc[0]].index.tolist()  # handles duplicate top counts

value_counts

另一种方法是构造一系列元组,然后应用pd.Series.value_counts:

res = pd.Series(list(zip(df.Name, df.Surname))).value_counts()

结果将是一系列由Name-Surname组合索引的计数,从最常见到最少.

The result will be a series of counts indexed by Name-Surname combinations, sorted from most common to least.

name, surname = res.index[0]  # return most common
most_common_dups = res[res == res.max()].index.tolist()

collections.Counter

如果您希望创建一个包含(name, surname): counts个条目的字典,则可以通过collections.Counter来实现:

collections.Counter

If you wish to create a dictionary of (name, surname): counts entries, you can do so via collections.Counter:

from collections import Counter

zipper = zip(df.Name, df.Surname)
c = Counter(zipper)

Counter具有有用的方法,例如most_common,可用于提取结果.

Counter has useful methods such as most_common, which you can use to extract your result.

这篇关于数据框中的一对值出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆