Scala:删除对象列表中的重复项 [英] Scala: Remove duplicates in list of objects

查看:110
本文介绍了Scala:删除对象列表中的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个对象列表 List [Object] 这些对象都是从同一个类中实例化的。这个类有一个字段必须是唯一的 Object.property 。什么是最简单的方法来迭代对象列表,并删除所有对象(但第一个)具有相同的属性?

I've got a list of objects List[Object] which are all instantiated from the same class. This class has a field which must be unique Object.property. What is the cleanest way to iterate the list of objects and remove all objects(but the first) with the same property?

推荐答案

list.groupBy(_.property).map(_._2.head)

说明:groupBy方法接受将元素转换为键进行分组的函数。 _。property 只是 elem的缩写:Object => elem.property (编译器生成一个唯一的名称,像 x $ 1 )。所以现在我们有一个地图 Map [Property,List [Object]] 。 A Map [K,V] extends Traversable [(K,V)] 。所以它可以像列表一样遍历,但元素是一个元组。这与Java的 Map#entrySet()类似。 map方法通过迭代每个元素并应用一个函数来创建一个新的集合。在这种情况下,函数是 _._ 2.head ,这是 elem的缩写:(Property,List [Object])=> elem._2.head _2 只是一个返回第二个元素的元组的方法。第二个元素是List [Object], head 返回第一个元素

Explanation: The groupBy method accepts a function that converts an element to a key for grouping. _.property is just shorthand for elem: Object => elem.property (the compiler generates a unique name, something like x$1). So now we have a map Map[Property, List[Object]]. A Map[K,V] extends Traversable[(K,V)]. So it can be traversed like a list, but elements are a tuple. This is similar to Java's Map#entrySet(). The map method creates a new collection by iterating each element and applying a function to it. In this case the function is _._2.head which is shorthand for elem: (Property, List[Object]) => elem._2.head. _2 is just a method of Tuple that returns the second element. The second element is List[Object] and head returns the first element

要使结果成为一个类型你想要:

To get the result to be a type you want:

import collection.breakOut
val l2: List[Object] = list.groupBy(_.property).map(_._2.head)(breakOut)

要简要解释一下, map 实际上期望两个参数,一个函数和一个用于构造结果的对象。在第一个代码片段中,您不会看到第二个值,因为它被标记为隐式,并由编译器从范围中的预定义值列表中提供。结果通常是从映射的容器中获得的。这通常是一件好事。 List上的map将返回List,Array上的map将返回Array等。然而,在这种情况下,我们要表达我们想要的容器作为结果。这是使用breakOut方法的地方。它只通过查看所需的结果类型来构建构建器(构建结果的东西)。它是一个通用的方法,编译器推断其泛型类型,因为我们明确地键入l2为 List [Object] 或保留顺序(假设 Object #property 的类型为属性):

To explain briefly, map actually expects two arguments, a function and an object that is used to construct the result. In the first code snippet you don't see the second value because it is marked as implicit and so provided by the compiler from a list of predefined values in scope. The result is usually obtained from the mapped container. This is usually a good thing. map on List will return List, map on Array will return Array etc. In this case however, we want to express the container we want as result. This is where the breakOut method is used. It constructs a builder (the thing that builds results) by only looking at the desired result type. It is a generic method and the compiler infers its generic types because we explicitly typed l2 to be List[Object] or, to preserve order (assuming Object#property is of type Property):

list.foldRight((List[Object](), Set[Property]())) {
  case (o, cum@(objects, props)) => 
    if (props(o.property)) cum else (o :: objects, props + o.property))
}._1

foldRight 是接受初始结果的方法和接受元素并返回更新结果的函数。该方法迭代每个元素,根据将该函数应用于每个元素并返回最终结果来更新结果。我们从右到左(而不是从左到右, foldLeft ),因为我们前面有对象 - 这是O(1),但附加是O(N)。还要观察这里的好的样式,我们使用模式匹配来提取元素。

foldRight is a method that accepts an initial result and a function that accepts an element and returns an updated result. The method iterates each element, updating the result according to applying the function to each element and returning the final result. We go from right to left (rather than left to right with foldLeft) because we are prepending to objects - this is O(1), but appending is O(N). Also observe the good styling here, we are using a pattern match to extract the elements.

在这种情况下,初始结果是一个空列表(tuple)和一套。列表是我们感兴趣的结果,该集合用于跟踪我们已经遇到的属性。在每次迭代中,我们检查集合道具是否已经包含属性(在Scala中, obj(x)被翻译为 obj.apply(x)。在 Set 中,方法应用 def apply(a:A):Boolean 。也就是说,接受一个元素,如果它存在,返回true / false)。如果属性存在(已经遇到),结果将按原样返回。否则,结果将更新为包含对象( o :: objects ),并且记录属性( props + o.property

In this case, the initial result is a pair (tuple) of an empty list and a set. The list is the result we're interested in and the set is used to keep track of what properties we already encountered. In each iteration we check if the set props already contains the property (in Scala, obj(x) is translated to obj.apply(x). In Set, the method apply is def apply(a: A): Boolean. That is, accepts an element and returns true / false if it exists or not). If the property exists (already encountered), the result is returned as-is. Otherwise the result is updated to contain the object (o :: objects) and the property is recorded (props + o.property)

更新:@andreypopp想要一个通用的方法:

Update: @andreypopp wanted a generic method:

import scala.collection.IterableLike
import scala.collection.generic.CanBuildFrom

class RichCollection[A, Repr](xs: IterableLike[A, Repr]){
  def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
    val builder = cbf(xs.repr)
    val i = xs.iterator
    var set = Set[B]()
    while (i.hasNext) {
      val o = i.next
      val b = f(o)
      if (!set(b)) {
        set += b
        builder += o
      }
    }
    builder.result
  }
}

implicit def toRich[A, Repr](xs: IterableLike[A, Repr]) = new RichCollection(xs)

使用:

scala> list.distinctBy(_.property)
res7: List[Obj] = List(Obj(1), Obj(2), Obj(3))

另请注意,这是非常有效的,因为我们正在使用构建器。如果你有很大的列表,你可能需要使用一个可变的HashSet而不是常规的集合,并对性能进行基准测试。

Also note that this is pretty efficient as we are using a builder. If you have really large lists, you may want to use a mutable HashSet instead of a regular set and benchmark the performance.

这篇关于Scala:删除对象列表中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆