查找列表中最常见的元素 [英] Find the most common element in a list

查看:23
本文介绍了查找列表中最常见的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python 列表中查找最常见元素的有效方法是什么?

我的列表项可能无法哈希,因此不能使用字典.同样在绘制的情况下,应返回具有最低索引的项目.示例:

<预><代码>>>>most_common(['duck', 'duck', 'goose'])'鸭'>>>most_common(['鹅','鸭','鸭','鹅'])'鹅'

解决方案

提出了这么多解决方案,我很惊讶没有人提出我认为显而易见的方案(对于不可散列但可比较的元素)-- [<代码>itertools.groupby][1].itertools 提供快速、可重用的功能,并允许您将一些棘手的逻辑委托给经过良好测试的标准库组件.考虑例如:

import itertools进口经营者def most_common(L):# 获得一个可迭代的 (item, iterable) 对SL = sorted((x, i) for i, x in enumerate(L))# 打印 'SL:', SL组 = itertools.groupby(SL, key=operator.itemgetter(0))# 获取物品质量"的辅助函数def _auxfun(g):项目,可迭代 = g计数 = 0min_index = len(L)对于 _,在可迭代的地方:计数 += 1min_index = min(min_index, where)# 打印 'item %r, count %r, minnd %r' % (item, count, min_index)返回计数,-min_index# 选择最高计数/最早的项目返回最大值(组,键= _auxfun)[0]

这当然可以写得更简洁,但我的目标是最大程度地清晰.可以取消注释两个 print 语句以更好地查看正在运行的机器;例如,with 打印未注释:

print most_common(['goose', 'duck', 'duck', 'goose'])

发射:

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]项目鸭子",计数 2,最小 1项目鹅",计数 2,最小 0鹅

如你所见,SL 是一个pairs的列表,每对一个item,后面跟着原始列表中item的索引(实现关键条件,如果最常见"的items相同的最高计数> 1,结果必须是最早出现的).

groupby 仅按项目分组(通过 operator.itemgetter).辅助函数,在 max 计算期间每个分组调用一次,接收并在内部解包一个组 - 一个包含两个项目的元组 (item, iterable) 其中迭代的项目也是二项元组,(item, original index) [[SL的项]].

然后辅助函数使用循环来确定组可迭代中的条目数,最小原始索引;它将那些作为组合的质量键"返回,最小索引符号已更改,因此 max 操作将更好地"考虑原始列表中较早出现的那些项目.

如果这段代码对时间和空间上的大 O 问题担心一点,那么它可能会更简单,例如......:

def most_common(L):组 = itertools.groupby(sorted(L))def _auxfun((item, iterable)):返回 len(list(iterable)), -L.index(item)返回最大值(组,键= _auxfun)[0]

相同的基本思想,只是表达得更简单和紧凑……但是,唉,额外的 O(N) 辅助空间(将组的可迭代对象体现为列表)和 O(N 平方) 时间(以获得 <每个项目的代码>L.index).虽然过早的优化是编程中万恶的根源,但在 O(N log N) 可用的情况下故意选择 O(N 平方) 方法只会违背可扩展性!-)

最后,对于那些更喜欢oneliners"而不是清晰度和性能的人来说,一个额外的 1-liner 版本,名称经过适当修改:-).

from itertools import groupby as gdef most_common_oneliner(L):return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

What is an efficient way to find the most common element in a Python list?

My list items may not be hashable so can't use a dictionary. Also in case of draws the item with the lowest index should be returned. Example:

>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

解决方案

With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:

import itertools
import operator

def most_common(L):
  # get an iterable of (item, iterable) pairs
  SL = sorted((x, i) for i, x in enumerate(L))
  # print 'SL:', SL
  groups = itertools.groupby(SL, key=operator.itemgetter(0))
  # auxiliary function to get "quality" for an item
  def _auxfun(g):
    item, iterable = g
    count = 0
    min_index = len(L)
    for _, where in iterable:
      count += 1
      min_index = min(min_index, where)
    # print 'item %r, count %r, minind %r' % (item, count, min_index)
    return count, -min_index
  # pick the highest-count/earliest item
  return max(groups, key=_auxfun)[0]

This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:

print most_common(['goose', 'duck', 'duck', 'goose'])

emits:

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose

As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).

groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].

Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.

This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:

def most_common(L):
  groups = itertools.groupby(sorted(L))
  def _auxfun((item, iterable)):
    return len(list(iterable)), -L.index(item)
  return max(groups, key=_auxfun)[0]

same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)

Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).

from itertools import groupby as g
def most_common_oneliner(L):
  return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

这篇关于查找列表中最常见的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆