包含NaN的集合的最小值/最大值(订购中的处理不可比性) [英] min/max of collections containing NaN (handling incomparability in ordering)

查看:97
本文介绍了包含NaN的集合的最小值/最大值(订购中的处理不可比性)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于以下行为,我刚遇到一个令人讨厌的错误:

I just ran into a nasty bug as a result of the following behavior:

scala> List(1.0, 2.0, 3.0, Double.NaN).min
res1: Double = NaN

scala> List(1.0, 2.0, 3.0, Double.NaN).max
res2: Double = NaN

我知道,对于成对比较,有时最好具有max(NaN, 0) = NaN,这可能是java.lang.Double.compare遵循此约定的原因(似乎是min和max函数退化为检查集合是否包含NaN.由于存在更多适当的方法来检查NaN的存在(例如collection.find(_.isNaN)),因此最好在集合上保持语义上有意义的最小/最大.

I understand that for a pairwise comparison it may sometimes be preferable to have max(NaN, 0) = NaN and this is probably the reason why java.lang.Double.compare follows this convention (there seems to be an IEEE standard for that). For a collection however, I really think that this is a strange convention. After all the above collection does contain valid numbers; and these numbers have a clear maximum and minimum. In my opinion, the conception that the maximum number of the collection is not a number is a contradiction since, well, NaN is not a number, so it cannot be the maximum or minimum "number" of a collection -- unless there are no valid numbers at all; in this case it makes perfect sense that the maximum "is not a number". Semantically the min and max functions degenerate to a check whether the collection contains a NaN. Since there are more appropriate ways to check for the existence of NaNs (e.g. collection.find(_.isNaN)) it would be great to maintain a semantically meaningful min/max on collections.

所以我的问题是:获得忽略NaNs存在的行为的最佳方法是什么?我看到两种可能性:

So my question is: What is the best approach to obtain the behavior to ignore the existence of NaNs? I see two possibilities:

  1. 在调用最小/最大之前过滤NaN.由于这需要在所有地方对问题进行明确处理,并且可能会导致性能下降,因此,我宁愿选择更简单的方法.

  1. Filtering NaNs before calling min/max. Since this requires explicit handling of the issue in all places and may incur performance penalties I would prefer something easier.

最好有一种可以忽略NaN的排序,可以在需要时将其用作隐式排序.我尝试了以下方法:

It would be great to have a kind of NaN-ignoring ordering which can be used as an implicit ordering wherever necessary. I tried the following:

  object NanAwareOrdering extends Ordering[Double] {
    def compare(x: Double, y: Double) = {
      if (x.isNaN()) {
        +1 // without checking x, return y < x
      } else if (y.isNaN()) {
        -1 // without checking y, return x < y
      } else {
        java.lang.Double.compare(x, y)
      }
    }
  }

但是,这种方法似乎取决于我是否对找到最小值或最大值感兴趣,即:

However, this approach seems to depend on whether I'm interesting in finding the min or max, i.e.:

 scala> List(1.0, 2.0, 3.0, Double.NaN).min(NanAwareOrdering)
 res7: Double = 1.0

 scala> List(1.0, 2.0, 3.0, Double.NaN).max(NanAwareOrdering)
 res8: Double = NaN

这意味着我将必须具有两个NanAwareOrdering,具体取决于我想要的是最小值还是最大值,而这将是不允许使用implicit val的.因此,我的问题是:如何定义一种可以同时处理两种情况的排序?

This means that I would have to have two NanAwareOrdering depending on whether I want the minimum or the maximum, which would forbid to have an implicit val. Therefore my question is: How can I define an ordering in a way to handle both cases at once?

更新:

出于完整性考虑:在分析问题的过程中,我意识到前提退化为NaN检查"实际上是错误的.实际上,我认为这更加难看:

For the sake of completeness: In the course of analyzing the issue I realized that the premise "degenerates to a check for NaN" is actually wrong. In fact, I think it is even more ugly:

scala> List(1.0, Double.NaN).min
res1: Double = NaN

scala> List(Double.NaN, 1.0).min
res2: Double = 1.0

推荐答案

免责声明:我将在问题中添加自己的答案,以防万一其他人仍然对这个问题的更多细节感兴趣.

我看起来这个问题比我预期的要复杂.正如阿列克谢·罗曼诺夫(Alexey Romanov)所指出的那样,不可比性的概念要求max/min函数采取部分顺序.不幸的是,Alexey也是正确的,即基于偏序的一般max/min函数没有意义:考虑一下偏序仅定义某些组内关系的情况,但组本身完全独立于彼此(例如,元素{a,b,c,d}仅具有两个关系a< b和c< d;我们将有两个max/min).在这方面,甚至可能有人争辩说,正式的max/min应该总是 返回两个值NaN 各自的有效min/max,因为NaN本身也是其极值自己的关系组.

I looks like this issue is more complex than I expected. As Alexey Romanov has already pointed out, the notion of incomparability would require the max/min functions to take a partial order. Unfortunately, Alexey is also right in the point, that a general max/min function based on a partial order does not make sense: Think of the case where the partial ordering only defines relations within certain groups, but the groups themselves are completely independent from each other (for instance, the elements {a, b, c, d} with just the two relations a < b and c < d; we would have two max/min). In that regard one may even argue that formally max/min should always return two values, NaN and the respective valid min/max since NaN itself is also an extremal value in its own relation group.

因此,由于部分订单过于笼统/过于复杂,因此min/max函数采用Ordering.不幸的是,总订单不允许出现不可比性的概念.回顾总订单的三个定义属性,很明显地,忽略NaN"在形式上是不可能的:

So as a result of partial orders being too general/complex, the min/max functions take an Ordering. Unfortunately a total order does not allow the notion of incomparability. Reviewing the three defining properties of a total order makes it pretty obvious that "ignoring NaNs" is formally impossible:

  1. 如果a≤b并且b≤a,则a = b(反对称)
  2. 如果a≤b且b≤c,则a≤c(传递性)
  3. a≤b或b≤a(总计)

...并练习...

因此,当尝试提出Ordering的实现以实现我们所需的最小/最大行为时,很明显,我们必须违反某些规则(并承担后果).在TraversableOncemin/max/minBy/maxBy的实现遵循以下模式(对于min):

... and practice ...

So when trying to come up with an implementation of an Ordering to fulfill our desired min/max behavior it is clear that we have to violate something (and bear the consequences). The implementation of min/max/minBy/maxBy in TraversableOnce follows the pattern (for min):

reduceLeft((x, y) => if (cmp.lteq(x, y)) x else y)

变体的

gteq.这给了我左偏"比较的想法,即:

and gteq for the max variants. This gave me the idea of "left biasing" the comparison, i.e.:

x   <comparison_operator> NaN    is always true to keep x in the reduction
NaN <comparison_operator> x      is always false to inject x into the reduction

这样的左偏"排序的最终实现看起来像这样:

The resulting implementation of such a "left biased" ordering would look like this:

object BiasedOrdering extends Ordering[Double] {
  def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) // this is inconsistent, but the same goes for Double.Ordering

  override def lteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) <= 0
  override def gteq(x: Double, y: Double): Boolean  = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) >= 0
  override def lt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) < 0
  override def gt(x: Double, y: Double): Boolean    = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) > 0
  override def equiv(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true  else compare(x, y) == 0

}

...已分析:

当前我正在尝试找出:

... analyzed:

Currently I'm trying to find out:

  • 此订单与默认订单的比较方式
  • 我们在哪里违反总订单属性,
  • 以及潜在的问题是什么.

我正在将其与Scala的默认顺序Ordering.Double以及直接从java.lang.Double.compare得出的以下顺序进行比较:

I'm comparing this to Scala's default order Ordering.Double and the following ordering which is directly derived from java.lang.Double.compare:

object OrderingDerivedFromCompare extends Ordering[Double] {
  def compare(x: Double, y: Double) = {
    java.lang.Double.compare(x, y)
  }
}

Scala默认顺序Ordering.Double的一个有趣特性是它会使用语言的本机数值比较运算符(<<===>=>)覆盖所有比较成员函数,因此比较结果完全相同,就好像我们将直接与这些运算符进行比较一样.下面显示了NaN和三个顺序的有效数字之间的所有可能关系:

One interesting property of Scala's default order Ordering.Double is that it overwrites all comparison member functions by the language's native numerical comparison operators (<, <=, ==, >=, >) so the comparison results are identical as if we would compare directly with these operators. The following shows all possible relations between a NaN and a valid number for the three orderings:

Ordering.Double             0.0 >  NaN = false
Ordering.Double             0.0 >= NaN = false
Ordering.Double             0.0 == NaN = false
Ordering.Double             0.0 <= NaN = false
Ordering.Double             0.0 <  NaN = false
OrderingDerivedFromCompare  0.0 >  NaN = false
OrderingDerivedFromCompare  0.0 >= NaN = false
OrderingDerivedFromCompare  0.0 == NaN = false
OrderingDerivedFromCompare  0.0 <= NaN = true
OrderingDerivedFromCompare  0.0 <  NaN = true
BiasedOrdering              0.0 >  NaN = true
BiasedOrdering              0.0 >= NaN = true
BiasedOrdering              0.0 == NaN = true
BiasedOrdering              0.0 <= NaN = true
BiasedOrdering              0.0 <  NaN = true

Ordering.Double             NaN >  0.0 = false
Ordering.Double             NaN >= 0.0 = false
Ordering.Double             NaN == 0.0 = false
Ordering.Double             NaN <= 0.0 = false
Ordering.Double             NaN <  0.0 = false
OrderingDerivedFromCompare  NaN >  0.0 = true
OrderingDerivedFromCompare  NaN >= 0.0 = true
OrderingDerivedFromCompare  NaN == 0.0 = false
OrderingDerivedFromCompare  NaN <= 0.0 = false
OrderingDerivedFromCompare  NaN <  0.0 = false
BiasedOrdering              NaN >  0.0 = false
BiasedOrdering              NaN >= 0.0 = false
BiasedOrdering              NaN == 0.0 = false
BiasedOrdering              NaN <= 0.0 = false
BiasedOrdering              NaN <  0.0 = false

Ordering.Double             NaN >  NaN = false
Ordering.Double             NaN >= NaN = false
Ordering.Double             NaN == NaN = false
Ordering.Double             NaN <= NaN = false
Ordering.Double             NaN <  NaN = false
OrderingDerivedFromCompare  NaN >  NaN = false
OrderingDerivedFromCompare  NaN >= NaN = true
OrderingDerivedFromCompare  NaN == NaN = true
OrderingDerivedFromCompare  NaN <= NaN = true
OrderingDerivedFromCompare  NaN <  NaN = false
BiasedOrdering              NaN >  NaN = false
BiasedOrdering              NaN >= NaN = true
BiasedOrdering              NaN == NaN = true
BiasedOrdering              NaN <= NaN = true
BiasedOrdering              NaN <  NaN = false

我们可以看到:

  • only OrderingDerivedFromCompare fulfills the total order properties. Based on this result the reasoning behind java.lang.Double.compare becomes much more clear: Placing NaN at the upper end of the total order simply avoids any contradiction!
  • Scala's default order and the biased order violate many totality conditions. Scala's default order always returns false, while for the biased order it depends on the position. Since both lead to contradictions it is difficult to see which may lead to more severe issues.

由于我们当前的实际问题,最小/最大函数起作用.对于OrderingDerivedFromCompare,现在很清楚我们必须获得什么-NaN只是最大值,因此无论列表中元素的排列方式如何,很显然要以maxN的形式获得它:

Now to our actual problem at hand, the min/max functions. For OrderingDerivedFromCompare it is now clear what we have to obtain -- NaN is simply the largest value, so it's clear to obtain it as max, irrespective of how the elements in the list are arranged:

OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).max = NaN
OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).max = NaN

现在是Scala的默认排序.令我震惊的是,这种情况实际上比我的问题中提到的还要复杂:

Now to Scala's default ordering. I was deeply shocked to see that the situation is actually even more intricate than mentioned in my question:

Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).min = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).max = NaN
Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0

实际上,元素的顺序变得相关(由于对于reduceLeft中的每个比较都返回false的结果). 左偏"显然可以解决此问题,从而获得一致的结果:

In fact the order of the elements becomes relevant (as a result of returning false for every comparison in the reduceLeft). "Left biasing" obviously solves this issue, leading to consistent results:

BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).max = 3.0
BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0

不幸的是,我仍然无法在这里完全回答所有问题.剩下的一些要点是:

Unfortunately, I'm still not able to fully answer all questions here. Some remaining points are:

  • 为什么Scala的默认顺序是按其方式定义的?目前,NaN的处理似乎存在很大缺陷. Ordering.Double的一个非常危险的细节是compare函数实际上是委派给java.lang.Double.compare的,而比较成员是基于该语言的本机比较来实现的.显然,这会导致结果不一致,例如:

  • Why is the Scala's default ordering defined the way it is? Currently handling of NaNs seems to be pretty flawed. A very dangerous detail of the Ordering.Double is that the compare function actually delegates to java.lang.Double.compare, while the comparison member are implemented based on the language's native comparisons. This obviously leads to inconsistent results, for instance:

Ordering.Double.compare(0.0, Double.NaN) == -1     // indicating 0.0 < NaN
Ordering.Double.lt     (0.0, Double.NaN) == false  // contradiction

  • 除了直接评估任何矛盾的比较之外,BiasedOrdering的潜在缺点是什么?快速检查sorted可获得以下结果,但没有发现任何麻烦:

  • What are potential drawbacks of the BiasedOrdering, apart from directly evaluating any contradicting comparison? A quick check on sorted gave the following results, which did not reveal any trouble:

    Ordering.Double             List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    OrderingDerivedFromCompare  List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    BiasedOrdering              List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
    
    Ordering.Double             List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    OrderingDerivedFromCompare  List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    BiasedOrdering              List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
    

  • 暂时,我将继续进行这种偏左的排序.但是,由于问题的性质无法提供完美无缺的一般解决方案:请谨慎使用!

    For the time being I'll have a go with this left biased ordering. But since the nature of the problem does not allow a flawless general solution: use with care!

    更新

    就僧侣建议的基于隐式类的解决方案而言,我非常喜欢以下内容(因为它根本不会搞乱(有瑕疵的?)总订单,但会在内部转换为干净的完全有序的域):

    And in terms of solutions based on an implicit class as monkjack suggested, I like the following a lot (since it does not mess with (flawed?) total orders at all, but internally converts to a clean totally ordered domain):

    implicit class MinMaxNanAware(t: TraversableOnce[Double]) {
      def nanAwareMin = t.minBy(x => if (x.isNaN) Double.PositiveInfinity else x)
      def nanAwareMax = t.maxBy(x => if (x.isNaN) Double.NegativeInfinity else x)
    }
    
    // and now we can simply use
    val goodMin = list.nanAwareMin
    

    这篇关于包含NaN的集合的最小值/最大值(订购中的处理不可比性)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆