在列表中查找最常见的元素 [英] Find the most common element in a list

查看:67
本文介绍了在列表中查找最常见的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python列表中查找最常见元素的有效方法是什么?

What is an efficient way to find the most common element in a Python list?

我的列表项可能无法散列,因此不能使用字典. 同样在绘制的情况下,应返回索引最低的项目.示例:

My list items may not be hashable so can't use a dictionary. Also in case of draws the item with the lowest index should be returned. Example:

>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

推荐答案

提出了这么多解决方案,我很惊讶没有人提出我认为很明显的解决方案(对于不可哈希但可比较的元素)-[itertools.groupby] [1]. itertools提供快速,可重用的功能,并允许您将一些棘手的逻辑委托给经过良好测试的标准库组件.考虑例如:

With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:

import itertools
import operator

def most_common(L):
  # get an iterable of (item, iterable) pairs
  SL = sorted((x, i) for i, x in enumerate(L))
  # print 'SL:', SL
  groups = itertools.groupby(SL, key=operator.itemgetter(0))
  # auxiliary function to get "quality" for an item
  def _auxfun(g):
    item, iterable = g
    count = 0
    min_index = len(L)
    for _, where in iterable:
      count += 1
      min_index = min(min_index, where)
    # print 'item %r, count %r, minind %r' % (item, count, min_index)
    return count, -min_index
  # pick the highest-count/earliest item
  return max(groups, key=_auxfun)[0]

这当然可以写得更简洁,但我的目标是最大程度地清晰.可以不加注释两个print语句,以更好地了解运行中的机制.例如, with 会打印未注释的内容:

This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:

print most_common(['goose', 'duck', 'duck', 'goose'])

发射:

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose

如您所见,SL是成对的列表,每对一个项目,后跟原始列表中的项目索引(以实现以下关键条件:如果具有相同最高计数的最常见"项目是> 1,结果必须是最早出现的一个.)

As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).

groupby仅按项目分组(通过operator.itemgetter).辅助功能在max计算期间每分组一次调用,它接收并在内部解压缩一个组-具有两个项(item, iterable)的元组,其中可迭代项也是两个项元组,(item, original index) [[ c3>]].

groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].

然后,辅助功能使用循环来确定组的可迭代条目的数量,最小原始索引;它将返回作为组合的质量关键字",并更改了最小索引符号,因此max操作将考虑更好"那些在原始列表中较早出现的项目.

Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.

如果担心时间和空间上的大O问题(例如....:

This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:

def most_common(L):
  groups = itertools.groupby(sorted(L))
  def _auxfun((item, iterable)):
    return len(list(iterable)), -L.index(item)
  return max(groups, key=_auxfun)[0]

相同的基本思想,只是表达得更简单,紧凑...但是,可惜的是,额外的O(N)辅助空间(以体现小组可迭代的列表)和O(N平方)时间(获得L.index每个项目).尽管过早的优化是编程中所有弊端的根源,但在O(N log N)可用的情况下,故意选择O(N平方)的方法与可扩展性背道而驰!-)

same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)

最后,对于那些更喜欢单线"而不是清晰度和性能的人,可以使用名称经过适当修饰的1线附加版本:-).

Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).

from itertools import groupby as g
def most_common_oneliner(L):
  return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

这篇关于在列表中查找最常见的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆