包含NaN的集合的最小值/最大值(订购中的处理不可比性) [英] min/max of collections containing NaN (handling incomparability in ordering)
问题描述
由于以下行为,我刚遇到一个令人讨厌的错误:
I just ran into a nasty bug as a result of the following behavior:
scala> List(1.0, 2.0, 3.0, Double.NaN).min
res1: Double = NaN
scala> List(1.0, 2.0, 3.0, Double.NaN).max
res2: Double = NaN
我知道,对于成对比较,有时最好具有max(NaN, 0) = NaN
,这可能是java.lang.Double.compare
遵循此约定的原因(似乎是min和max
函数退化为检查集合是否包含NaN.由于存在更多适当的方法来检查NaN的存在(例如collection.find(_.isNaN)
),因此最好在集合上保持语义上有意义的最小/最大.
I understand that for a pairwise comparison it may sometimes be preferable to have max(NaN, 0) = NaN
and this is probably the reason why java.lang.Double.compare
follows this convention (there seems to be an IEEE standard for that). For a collection however, I really think that this is a strange convention. After all the above collection does contain valid numbers; and these numbers have a clear maximum and minimum. In my opinion, the conception that the maximum number of the collection is not a number is a contradiction since, well, NaN is not a number, so it cannot be the maximum or minimum "number" of a collection -- unless there are no valid numbers at all; in this case it makes perfect sense that the maximum "is not a number". Semantically the min
and max
functions degenerate to a check whether the collection contains a NaN. Since there are more appropriate ways to check for the existence of NaNs (e.g. collection.find(_.isNaN)
) it would be great to maintain a semantically meaningful min/max on collections.
所以我的问题是:获得忽略NaNs存在的行为的最佳方法是什么?我看到两种可能性:
So my question is: What is the best approach to obtain the behavior to ignore the existence of NaNs? I see two possibilities:
-
在调用最小/最大之前过滤NaN.由于这需要在所有地方对问题进行明确处理,并且可能会导致性能下降,因此,我宁愿选择更简单的方法.
Filtering NaNs before calling min/max. Since this requires explicit handling of the issue in all places and may incur performance penalties I would prefer something easier.
最好有一种可以忽略NaN的排序,可以在需要时将其用作隐式排序.我尝试了以下方法:
It would be great to have a kind of NaN-ignoring ordering which can be used as an implicit ordering wherever necessary. I tried the following:
object NanAwareOrdering extends Ordering[Double] {
def compare(x: Double, y: Double) = {
if (x.isNaN()) {
+1 // without checking x, return y < x
} else if (y.isNaN()) {
-1 // without checking y, return x < y
} else {
java.lang.Double.compare(x, y)
}
}
}
但是,这种方法似乎取决于我是否对找到最小值或最大值感兴趣,即:
However, this approach seems to depend on whether I'm interesting in finding the min or max, i.e.:
scala> List(1.0, 2.0, 3.0, Double.NaN).min(NanAwareOrdering)
res7: Double = 1.0
scala> List(1.0, 2.0, 3.0, Double.NaN).max(NanAwareOrdering)
res8: Double = NaN
这意味着我将必须具有两个NanAwareOrdering,具体取决于我想要的是最小值还是最大值,而这将是不允许使用implicit val
的.因此,我的问题是:如何定义一种可以同时处理两种情况的排序?
This means that I would have to have two NanAwareOrdering depending on whether I want the minimum or the maximum, which would forbid to have an implicit val
. Therefore my question is: How can I define an ordering in a way to handle both cases at once?
更新:
出于完整性考虑:在分析问题的过程中,我意识到前提退化为NaN检查"实际上是错误的.实际上,我认为这更加难看:
For the sake of completeness: In the course of analyzing the issue I realized that the premise "degenerates to a check for NaN" is actually wrong. In fact, I think it is even more ugly:
scala> List(1.0, Double.NaN).min
res1: Double = NaN
scala> List(Double.NaN, 1.0).min
res2: Double = 1.0
推荐答案
免责声明:我将在问题中添加自己的答案,以防万一其他人仍然对这个问题的更多细节感兴趣.
我看起来这个问题比我预期的要复杂.正如阿列克谢·罗曼诺夫(Alexey Romanov)所指出的那样,不可比性的概念要求max/min函数采取部分顺序.不幸的是,Alexey也是正确的,即基于偏序的一般max/min函数没有意义:考虑一下偏序仅定义某些组内关系的情况,但组本身完全独立于彼此(例如,元素{a,b,c,d}仅具有两个关系a< b和c< d;我们将有两个max/min).在这方面,甚至可能有人争辩说,正式的max/min应该总是 返回两个值NaN 和各自的有效min/max,因为NaN本身也是其极值自己的关系组.
I looks like this issue is more complex than I expected. As Alexey Romanov has already pointed out, the notion of incomparability would require the max/min functions to take a partial order. Unfortunately, Alexey is also right in the point, that a general max/min function based on a partial order does not make sense: Think of the case where the partial ordering only defines relations within certain groups, but the groups themselves are completely independent from each other (for instance, the elements {a, b, c, d} with just the two relations a < b and c < d; we would have two max/min). In that regard one may even argue that formally max/min should always return two values, NaN and the respective valid min/max since NaN itself is also an extremal value in its own relation group.
因此,由于部分订单过于笼统/过于复杂,因此min/max函数采用Ordering
.不幸的是,总订单不允许出现不可比性的概念.回顾总订单的三个定义属性,很明显地,忽略NaN"在形式上是不可能的:
So as a result of partial orders being too general/complex, the min/max functions take an Ordering
. Unfortunately a total order does not allow the notion of incomparability. Reviewing the three defining properties of a total order makes it pretty obvious that "ignoring NaNs" is formally impossible:
- 如果a≤b并且b≤a,则a = b(反对称)
- 如果a≤b且b≤c,则a≤c(传递性)
- a≤b或b≤a(总计)
...并练习...
因此,当尝试提出Ordering
的实现以实现我们所需的最小/最大行为时,很明显,我们必须违反某些规则(并承担后果).在TraversableOnce
中min
/max
/minBy
/maxBy
的实现遵循以下模式(对于min
):
... and practice ...
So when trying to come up with an implementation of an Ordering
to fulfill our desired min/max behavior it is clear that we have to violate something (and bear the consequences). The implementation of min
/max
/minBy
/maxBy
in TraversableOnce
follows the pattern (for min
):
reduceLeft((x, y) => if (cmp.lteq(x, y)) x else y)
和 and 这样的左偏"排序的最终实现看起来像这样: The resulting implementation of such a "left biased" ordering would look like this:
当前我正在尝试找出:
Currently I'm trying to find out: 我正在将其与Scala的默认顺序 I'm comparing this to Scala's default order Scala默认顺序 One interesting property of Scala's default order 我们可以看到: 由于我们当前的实际问题,最小/最大函数起作用.对于 Now to our actual problem at hand, the min/max functions. For 现在是Scala的默认排序.令我震惊的是,这种情况实际上比我的问题中提到的还要复杂: Now to Scala's default ordering. I was deeply shocked to see that the situation is actually even more intricate than mentioned in my question: 实际上,元素的顺序变得相关(由于对于 In fact the order of the elements becomes relevant (as a result of returning 不幸的是,我仍然无法在这里完全回答所有问题.剩下的一些要点是: Unfortunately, I'm still not able to fully answer all questions here. Some remaining points are: 为什么Scala的默认顺序是按其方式定义的?目前,NaN的处理似乎存在很大缺陷. Why is the Scala's default ordering defined the way it is? Currently handling of NaNs seems to be pretty flawed. A very dangerous detail of the
除了直接评估任何矛盾的比较之外, What are potential drawbacks of the gteq
.这给了我左偏"比较的想法,即:gteq
for the max
variants. This gave me the idea of "left biasing" the comparison, i.e.:x <comparison_operator> NaN is always true to keep x in the reduction
NaN <comparison_operator> x is always false to inject x into the reduction
object BiasedOrdering extends Ordering[Double] {
def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) // this is inconsistent, but the same goes for Double.Ordering
override def lteq(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true else compare(x, y) <= 0
override def gteq(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true else compare(x, y) >= 0
override def lt(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) < 0
override def gt(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) false else compare(x, y) > 0
override def equiv(x: Double, y: Double): Boolean = if (x.isNaN() && !y.isNaN) false else if (!x.isNaN() && y.isNaN) true else if (x.isNaN() && y.isNaN) true else compare(x, y) == 0
}
...已分析:
... analyzed:
Ordering.Double
以及直接从java.lang.Double.compare
得出的以下顺序进行比较:Ordering.Double
and the following ordering which is directly derived from java.lang.Double.compare
:object OrderingDerivedFromCompare extends Ordering[Double] {
def compare(x: Double, y: Double) = {
java.lang.Double.compare(x, y)
}
}
Ordering.Double
的一个有趣特性是它会使用语言的本机数值比较运算符(<
,<=
,==
,>=
,>
)覆盖所有比较成员函数,因此比较结果完全相同,就好像我们将直接与这些运算符进行比较一样.下面显示了NaN和三个顺序的有效数字之间的所有可能关系:Ordering.Double
is that it overwrites all comparison member functions by the language's native numerical comparison operators (<
, <=
, ==
, >=
, >
) so the comparison results are identical as if we would compare directly with these operators. The following shows all possible relations between a NaN and a valid number for the three orderings:Ordering.Double 0.0 > NaN = false
Ordering.Double 0.0 >= NaN = false
Ordering.Double 0.0 == NaN = false
Ordering.Double 0.0 <= NaN = false
Ordering.Double 0.0 < NaN = false
OrderingDerivedFromCompare 0.0 > NaN = false
OrderingDerivedFromCompare 0.0 >= NaN = false
OrderingDerivedFromCompare 0.0 == NaN = false
OrderingDerivedFromCompare 0.0 <= NaN = true
OrderingDerivedFromCompare 0.0 < NaN = true
BiasedOrdering 0.0 > NaN = true
BiasedOrdering 0.0 >= NaN = true
BiasedOrdering 0.0 == NaN = true
BiasedOrdering 0.0 <= NaN = true
BiasedOrdering 0.0 < NaN = true
Ordering.Double NaN > 0.0 = false
Ordering.Double NaN >= 0.0 = false
Ordering.Double NaN == 0.0 = false
Ordering.Double NaN <= 0.0 = false
Ordering.Double NaN < 0.0 = false
OrderingDerivedFromCompare NaN > 0.0 = true
OrderingDerivedFromCompare NaN >= 0.0 = true
OrderingDerivedFromCompare NaN == 0.0 = false
OrderingDerivedFromCompare NaN <= 0.0 = false
OrderingDerivedFromCompare NaN < 0.0 = false
BiasedOrdering NaN > 0.0 = false
BiasedOrdering NaN >= 0.0 = false
BiasedOrdering NaN == 0.0 = false
BiasedOrdering NaN <= 0.0 = false
BiasedOrdering NaN < 0.0 = false
Ordering.Double NaN > NaN = false
Ordering.Double NaN >= NaN = false
Ordering.Double NaN == NaN = false
Ordering.Double NaN <= NaN = false
Ordering.Double NaN < NaN = false
OrderingDerivedFromCompare NaN > NaN = false
OrderingDerivedFromCompare NaN >= NaN = true
OrderingDerivedFromCompare NaN == NaN = true
OrderingDerivedFromCompare NaN <= NaN = true
OrderingDerivedFromCompare NaN < NaN = false
BiasedOrdering NaN > NaN = false
BiasedOrdering NaN >= NaN = true
BiasedOrdering NaN == NaN = true
BiasedOrdering NaN <= NaN = true
BiasedOrdering NaN < NaN = false
OrderingDerivedFromCompare
满足总订单属性.根据此结果 java.lang.Double.compare
背后的原因变得更加清楚:将NaN放在总订单的上端只是避免了任何矛盾!false
,而偏向顺序则取决于位置.由于两者都导致矛盾,因此很难看出哪些可能导致更严重的问题.
OrderingDerivedFromCompare
fulfills the total order properties. Based on this result the reasoning behind java.lang.Double.compare
becomes much more clear: Placing NaN at the upper end of the total order simply avoids any contradiction!false
, while for the biased order it depends on the position. Since both lead to contradictions it is difficult to see which may lead to more severe issues.OrderingDerivedFromCompare
,现在很清楚我们必须获得什么-NaN只是最大值,因此无论列表中元素的排列方式如何,很显然要以maxN的形式获得它:OrderingDerivedFromCompare
it is now clear what we have to obtain -- NaN is simply the largest value, so it's clear to obtain it as max, irrespective of how the elements in the list are arranged:OrderingDerivedFromCompare List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
OrderingDerivedFromCompare List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
OrderingDerivedFromCompare List(1.0, 2.0, 3.0, Double.NaN).max = NaN
OrderingDerivedFromCompare List(Double.NaN, 1.0, 2.0, 3.0).max = NaN
Ordering.Double List(1.0, 2.0, 3.0, Double.NaN).min = NaN
Ordering.Double List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
Ordering.Double List(1.0, 2.0, 3.0, Double.NaN).max = NaN
Ordering.Double List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
reduceLeft
中的每个比较都返回false
的结果). 左偏"显然可以解决此问题,从而获得一致的结果:false
for every comparison in the reduceLeft
). "Left biasing" obviously solves this issue, leading to consistent results:BiasedOrdering List(1.0, 2.0, 3.0, Double.NaN).min = 1.0
BiasedOrdering List(Double.NaN, 1.0, 2.0, 3.0).min = 1.0
BiasedOrdering List(1.0, 2.0, 3.0, Double.NaN).max = 3.0
BiasedOrdering List(Double.NaN, 1.0, 2.0, 3.0).max = 3.0
Ordering.Double
的一个非常危险的细节是compare
函数实际上是委派给java.lang.Double.compare
的,而比较成员是基于该语言的本机比较来实现的.显然,这会导致结果不一致,例如:
Ordering.Double
is that the compare
function actually delegates to java.lang.Double.compare
, while the comparison member are implemented based on the language's native comparisons. This obviously leads to inconsistent results, for instance:Ordering.Double.compare(0.0, Double.NaN) == -1 // indicating 0.0 < NaN
Ordering.Double.lt (0.0, Double.NaN) == false // contradiction
BiasedOrdering
的潜在缺点是什么?快速检查sorted
可获得以下结果,但没有发现任何麻烦:BiasedOrdering
, apart from directly evaluating any contradicting comparison? A quick check on sorted
gave the following results, which did not reveal any trouble:Ordering.Double List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
OrderingDerivedFromCompare List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
BiasedOrdering List(1.0, 2.0, 3.0, Double.NaN).sorted = List(1.0, 2.0, 3.0, NaN)
Ordering.Double List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
OrderingDerivedFromCompare List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
BiasedOrdering List(Double.NaN, 1.0, 2.0, 3.0).sorted = List(1.0, 2.0, 3.0, NaN)
暂时,我将继续进行这种偏左的排序.但是,由于问题的性质无法提供完美无缺的一般解决方案:请谨慎使用!
For the time being I'll have a go with this left biased ordering. But since the nature of the problem does not allow a flawless general solution: use with care!
更新
就僧侣建议的基于隐式类的解决方案而言,我非常喜欢以下内容(因为它根本不会搞乱(有瑕疵的?)总订单,但会在内部转换为干净的完全有序的域):
And in terms of solutions based on an implicit class as monkjack suggested, I like the following a lot (since it does not mess with (flawed?) total orders at all, but internally converts to a clean totally ordered domain):
implicit class MinMaxNanAware(t: TraversableOnce[Double]) {
def nanAwareMin = t.minBy(x => if (x.isNaN) Double.PositiveInfinity else x)
def nanAwareMax = t.maxBy(x => if (x.isNaN) Double.NegativeInfinity else x)
}
// and now we can simply use
val goodMin = list.nanAwareMin
这篇关于包含NaN的集合的最小值/最大值(订购中的处理不可比性)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!