在列表列表中查找最常出现的配对 [英] Finding the most frequent occurrences of pairs in a list of lists

查看:53
本文介绍了在列表列表中查找最常出现的配对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,它表示许多技术报告的作者列表.每个报告可以由一个或多个人撰写:

I've a dataset that denotes the list of authors of many technical reports. Each report can be authored by one or multiple people:

a = [
['John', 'Mark', 'Jennifer'],
['John'],
['Joe', 'Mark'],
['John', 'Anna', 'Jennifer'],
['Jennifer', 'John', 'Mark']
]

我必须找到最频繁的人,即过去合作最多的人:

I've to find the most frequent pairs, that is, people that had most collaborations in the past:

['John', 'Jennifer'] - 3 times
['John', 'Mark'] - 2 times
['Mark', 'Jennifer'] - 2 times
etc...

如何在Python中执行此操作?

How to do this in Python?

推荐答案

itertools.combinations使用collections.Counter字典:

from collections import Counter
from itertools import combinations

d  = Counter()
for sub in a:
    if len(a) < 2:
        continue
    sub.sort()
    for comb in combinations(sub,2):
        d[comb] += 1

print(d.most_common())
[(('Jennifer', 'John'), 3), (('John', 'Mark'), 2), (('Jennifer', 'Mark'), 2), (('Anna', 'John'), 1), (('Joe', 'Mark'), 1), (('Anna', 'Jennifer'), 1)]

most_common()将按照最常见到最小的顺序返回配对,您希望第一个n最常见的配对只需通过n d.most_common(n)

most_common() will return the pairings in order of most common to least, of you want the first n most common just pass n d.most_common(n)

这篇关于在列表列表中查找最常出现的配对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆