在集合中找到最频繁/常见的元素? [英] Finding the most frequent/common element in a collection?

查看:73
本文介绍了在集合中找到最频繁/常见的元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在集合中查找最频繁/最常见元素的最佳方法是什么?例如:

list = List(1, 3, 4, 4, 2)
list.mostCommon   // => 4        !! This is what I want !!

嗯.可以做的是先做一个groupBy,然后再按length进行map,然后选择最大的一个.这样您就会得到:

Map(1 -> List(1), 4 -> List(4, 4), 3 -> List(3), 2 -> List(2))
(...)
Map(1 -> 1, 4 -> 2, 3 -> 1, 2 -> 1)  // mapped by length. 4 -> 2  since there's two 4s

然后最后,选择映射到最高数字(2)的键(4). (<嵌套问题:最好的方法是什么?).如此简单的操作似乎需要做很多工作.

是否有更好/更惯用的方式来做到这一点?

解决方案

我必须说:

list.groupBy(identity).mapValues(_.size).maxBy(_._2)._1

或者只是:

list.groupBy(identity).maxBy(_._2.size)._1

对我来说真的不是那么多工作.

如果您担心只需要计数时为每个值建立列表的开销,可以执行以下操作:

list.foldLeft(Map.empty[Int, Int].withDefaultValue(0)) {
  case (m, v) => m.updated(v, m(v) + 1)
}.maxBy(_._2)._1

或者甚至跟踪最大移动量,以免在末尾进行额外遍历:

list.foldLeft(
  Map.empty[Int, Int].withDefaultValue(0), -1 -> Double.NegativeInfinity
) {
  case ((m, (maxV, maxCount)), v) =>
    val count = m(v) + 1
    if (count > maxCount) (m.updated(v, count), v -> count)
    else (m.updated(v, count), maxV -> maxCount)
}._2._1

这显然比上面的单行代码难读,因此,我建议坚持使用它们,除非您可以证明(即使用基准测试而不是推测)它们是您应用程序中的瓶颈. /p>

What is the best way to find the most frequent/common element in a collection? For example:

list = List(1, 3, 4, 4, 2)
list.mostCommon   // => 4        !! This is what I want !!

Hmm.. What one could do is to do a groupBy first and then map them by length, and then select the biggest one. So then you would get:

Map(1 -> List(1), 4 -> List(4, 4), 3 -> List(3), 2 -> List(2))
(...)
Map(1 -> 1, 4 -> 2, 3 -> 1, 2 -> 1)  // mapped by length. 4 -> 2  since there's two 4s

And then in the end, choose the key (4) that maps to the highest number (2). (nested question: what is the best way to do that?). But that seems like a lot of work for such a simple operation..?

Is there a better / more idiomatic way to do this?

解决方案

I have to say that:

list.groupBy(identity).mapValues(_.size).maxBy(_._2)._1

Or just:

list.groupBy(identity).maxBy(_._2.size)._1

Doesn't really seem like that much work to me.

If you're worried about the overhead of building up the lists for each value when you only need counts, you could do the following:

list.foldLeft(Map.empty[Int, Int].withDefaultValue(0)) {
  case (m, v) => m.updated(v, m(v) + 1)
}.maxBy(_._2)._1

Or even keep track of the maximum as you go, to avoid the extra traversal at the end:

list.foldLeft(
  Map.empty[Int, Int].withDefaultValue(0), -1 -> Double.NegativeInfinity
) {
  case ((m, (maxV, maxCount)), v) =>
    val count = m(v) + 1
    if (count > maxCount) (m.updated(v, count), v -> count)
    else (m.updated(v, count), maxV -> maxCount)
}._2._1

This is obviously much less readable than the one-liners above, though, so I'd recommend sticking with them unless you can show (i.e., with benchmarking, not speculation) that they're a bottleneck in your application.

这篇关于在集合中找到最频繁/常见的元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆