subsetOf 与 forall 包含 [英] subsetOf versus forall contains

查看:39
本文介绍了subsetOf 与 forall 包含的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑我有:

case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result

虽然以下内容成立:

xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true

我面临:

xs forall(ys contains _)    // true
xs.toSet subsetOf ys.toSet  // false

为什么?我的意思是,很明显,从 Seq 中制作 Set 会在重复的情况下选择随机元素,但由于(...).distinct.sameElements(...)".

Why? I mean, it´s clear that making a Set out of a Seq chooses random elements in case of duplicates, but there are no duplicates because of "(...).distinct.sameElements(...)".

我当然需要更深入地了解平等检查的种类...

I certainly need a deeper understanding of the kind of equality check...

经过长时间的搜索,我找到了问题,并将其浓缩为以下内容:

After a long search, I found the problem and condensed it to the following:

我的元素不一样,但是我必须仔细看看为什么 distinct.sameElements 没有抱怨.但同时出现了一个新问题:

My elements are not the same, however I must take a closer look why distinct.sameElements isn´t complaining. But meanwhile a new question arose:

考虑一下:

val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)

// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])

// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [   5], 1 -> [  11], 2 -> [  22], 3 -> [ 333])
println(m2) // Map(5 -> [  55], 1 -> [  11], 2 -> [    ], 3 -> [    ])

所以我在第一个序列中的元素不一样,因为它们依赖于 m2-construct,所以每次访问它们时它们都是不同的.

So my elements in my first to sequences aren´t the same because they depend on a m2-construct and so each time a accessing them they are different.

我的新问题是,为什么 m2m1 相比表现得像一个函数,尽管两者都是不可变的映射.这对我来说并不直观.

My new question is, why does m2 behave like a function in contrast to m1 although both are immutable maps. That isn´t intuitively for me.

推荐答案

这方面问题最常见的原因——测试集相等性等等——是

The most common reasons for problems in this area--testing set equality and the like--are

  1. hashCode 不同意 equals
  2. 您的值不稳定(因此之前的hashCode 与当前的equals 不一致)
  1. hashCode does not agree with equals
  2. Your values are not stable (so previous hashCode does not agree with current equals)

原因是这很重要,distincttoSet 使用哈希码来构建集合,而 contains 只是用一个 exists:

The reason is that this matters is that distinct and toSet use hash codes to build sets, whereas contains simply runs over the collection with an exists:

xs forall(ys contains _) == xs forall (x => ys exists (y => x==y) )

由于许多集合在大于某个最小大小(通常为 4)之前不会开始使用哈希码,因此这变得更加复杂,因此您在测试时并不总是注意到这一点.但让我们向自己证明:

This is made more complicated by the fact that many sets don't start using hash codes until they're larger than some minimal size (usually 4), so you don't always notice this with testing. But let's prove it to ourselves:

class Liar(s: String) {
  override def equals(o: Any) = o match {
    case l: Liar => s == l.s
    case _ => _
  }
  // No hashCode override!
}
val strings = List("Many","song","lyrics","go","na","na","na","na")
val lies = strings.map(s => new Liar(s))
val truly_distinct = lies.take(5)
lies.length          // 8
lies.distinct.length // 8!
lies.toSet.size      // 8!
lies forall( truly_distinct contains _ )   // True, because it's true
lies.toSet subsetOf truly_distinct.toSet   // False--not even the same size!

好的,现在我们知道对于大多数这些操作,匹配 hashCodeequals 是一件好事.

Okay, so now we know that for most of these operations, matching up hashCode and equals is a Good Thing.

警告:在 Java 中,即使使用原语也会经常发生不匹配:

Warning: in Java, mismatches happens frequently even with primitives:

new java.lang.Float(1.0) == new java.lang.Integer(1)                       // True
(new java.lang.Float(1.0)).hashCode == (new java.lang.Integer(1)).hashCode // Uh-oh

但 Scala 现在至少能捕捉到这一点(希望每次都能):

but Scala now at least catches that (hopefully every time):

(new java.lang.Float(1.0)).## == (new java.lang.Integer(1)).##   // Whew

Case 类也能很好地做到这一点,所以我们只剩下三种可能性

Case classes also do this properly, so we're left with three possibilities

  1. 您覆盖了 equals 但没有覆盖 hashCode 以匹配
  2. 你的价值观不稳定
  3. 有一个错误,Java 包装的原始 hashCode 不匹配又回来咬你
  1. You overrode equals but not hashCode to match
  2. Your values are not stable
  3. There is a bug and Java wrapped primitive hashCode mismatch is coming back to bite you

第一个很简单.

第二个似乎是您的问题,它源于 mapValues 实际上创建原始集合的 view 而不是新集合的事实.(filterKeys 也这样做.)就我个人而言,我认为这是一个有问题的设计选择,因为通常当你有一个视图并且你想要创建它的一个具体实例时,你.强制它.但是默认地图没有 .force 因为它们没有意识到它们可能是视图.所以你必须求助于诸如

The second one seems to be your problem, and it arises from the fact that mapValues actually creates a view of the original collection, not a new collection. (filterKeys does this also.) Personally, I think this is a questionable choice of design, since normally when you have a view and you want to make a single concrete instance of it, you .force it. But default maps don't have a .force because they don't realize that they might be views. So you have to resort to things like

myMap.map{ case (k,v) => (k, /* something that produces a new v */) }
myMap.mapValues(v => /* something that produces a new v */).view.force
Map() ++ myMap.mapValues(v => /* something that produces a new v */)

如果您正在执行文件 IO 之类的操作来映射您的值(例如,如果您的值是文件名并且您要映射到它们的内容)并且您不想一遍又一遍地读取文件,那么这非常重要.

This is really important if you're doing things like file IO to map your values (e.g. if your values are filenames and you're mapping to their contents) and you don't want to read the file over and over again.

但是你的情况——你分配随机值——是另一个重要的地方,选择一个副本,而不是一遍又一遍地重新创建值.

But your case--where you're assigning random values--is another where it is important to pick a single copy, not recreate the values over and over.

这篇关于subsetOf 与 forall 包含的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆