检查容器中是否存在NaN [英] Checking for NaN presence in a container

查看:73
本文介绍了检查容器中是否存在NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我检查列表或集合中是否存在NaN时,它会得到很好的处理.但我不知道如何. [更新:不,不是;如果找到相同的NaN实例,则报告为存在;如果仅找到不相同的NaN实例,则将其报告为不存在.]

NaN is handled perfectly when I check for its presence in a list or a set. But I don't understand how. [UPDATE: no it's not; it is reported as present if the identical instance of NaN is found; if only non-identical instances of NaN are found, it is reported as absent.]

  1. 我认为列表中的存在是通过相等性测试的,所以我希望自NaN!= NaN以来找不到NaN.

  1. I thought presence in a list is tested by equality, so I expected NaN to not be found since NaN != NaN.

hash(NaN)和hash(0)均为0.字典和集合如何区分NaN和0?

hash(NaN) and hash(0) are both 0. How do dictionaries and sets tell NaN and 0 apart?

使用in运算符检查任意容器中是否存在NaN是否安全?还是依赖于实现?

Is it safe to check for NaN presence in an arbitrary container using in operator? Or is it implementation dependent?

我的问题是关于Python 3.2.1;但是如果将来的版本中已有/计划进行任何更改,我也想知道.

My question is about Python 3.2.1; but if there are any changes existing/planned in future versions, I'd like to know that too.

NaN = float('nan')
print(NaN != NaN) # True
print(NaN == NaN) # False

list_ = (1, 2, NaN)
print(NaN in list_) # True; works fine but how?

set_ = {1, 2, NaN}
print(NaN in set_) # True; hash(NaN) is some fixed integer, so no surprise here
print(hash(0)) # 0
print(hash(NaN)) # 0
set_ = {1, 2, 0}
print(NaN in set_) # False; works fine, but how?

请注意,如果我将用户定义类的实例添加到list,然后检查是否包含,则至少在CPython中调用该实例的__eq__方法(如果已定义).这就是为什么我假设使用操作员==测试了list包含条件的原因.

Note that if I add an instance of a user-defined class to a list, and then check for containment, the instance's __eq__ method is called (if defined) - at least in CPython. That's why I assumed that list containment is tested using operator ==.

按照罗曼的回答,对于listtuplesetdict来说,__contains__似乎表现得很奇怪:

Per Roman's answer, it would seem that __contains__ for list, tuple, set, dict behaves in a very strange way:

def __contains__(self, x):
  for element in self:
    if x is element:
      return True
    if x == element:
      return True
  return False

我说奇怪"是因为我没有在文档中看到它的解释(也许我错过了它),我认为这不应该作为实现选择.

I say 'strange' because I didn't see it explained in the documentation (maybe I missed it), and I think this is something that shouldn't be left as an implementation choice.

当然,一个NaN对象可能与另一个NaN对象不同(在id的意义上). (这并不令人感到意外; Python不能保证这种身份.实际上,我从未见过CPython共享一个在不同位置创建的NaN实例,即使它共享一个小数目或短字符串的实例.)这意味着未定义在内置容器中是否存在NaN的测试.

Of course, one NaN object may not be identical (in the sense of id) to another NaN object. (This not really surprising; Python doesn't guarantee such identity. In fact, I never saw CPython share an instance of NaN created in different places, even though it shares an instance of a small number or a short string.) This means that testing for NaN presence in a built-in container is undefined.

这非常危险,而且非常微妙.有人可能会运行我上面显示的代码,并错误地得出结论,使用in测试NaN成员身份是安全的.

This is very dangerous, and very subtle. Someone might run the very code I showed above, and incorrectly conclude that it's safe to test for NaN membership using in.

我认为没有解决此问题的完美方法.一种非常安全的方法是确保绝对不要将NaN添加到内置容器中. (检查整个代码是很痛苦的……)

I don't think there is a perfect workaround to this issue. One, very safe approach, is to ensure that NaN's are never added to built-in containers. (It's a pain to check for that all over the code...)

另一种选择是注意in的左侧可能有NaN的情况,在这种情况下,请使用math.isnan()分别测试NaN成员身份.此外,还需要避免或重写其他操作(例如,设置交集).

Another alternative is watch out for cases where in might have NaN on the left side, and in such cases, test for NaN membership separately, using math.isnan(). In addition, other operations (e.g., set intersection) need to also be avoided or rewritten.

推荐答案

问题#1:为什么在同一对象中的容器中会发现NaN.

Question #1: why is NaN found in a container when it's an identical object.

文档:

对于容器类型,例如列表,元组,集合,frozenset,dict或 collections.deque,y中的表达式x等于any(x为e 或x == e(代表y中的e).

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

这正是我用NaN观察到的,所以一切都很好.为什么要这个规则?我怀疑这是因为dict/set要诚实地报告它包含某个对象(如果该对象实际上在其中)(即使__eq__()由于某种原因选择报告该对象不等于自身)

This is precisely what I observe with NaN, so everything is fine. Why this rule? I suspect it's because a dict/set wants to honestly report that it contains a certain object if that object is actually in it (even if __eq__() for whatever reason chooses to report that the object is not equal to itself).

问题2:为什么NaN的哈希值与0相同?

Question #2: why is the hash value for NaN the same as for 0?

文档:

由内置函数hash()调用,用于对成员的操作 散列集合,包括set,frozenset和dict. 哈希() 应该返回一个整数.唯一需要的属性是对象 比较相等的散列值相同;建议以某种方式 混合在一起(例如,使用互斥或​​)的哈希值 对象的组成部分在比较中也起着一定的作用 对象.

Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. hash() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

请注意,此要求仅在一个方向上进行;具有相同散列的对象不必相等!起初我以为是错字,但后来我意识到不是.无论如何,即使使用默认的__hash__(),也会发生哈希冲突(请参见此处).容器可以毫无问题地处理碰撞.当然,它们最终会使用==运算符来比较元素,因此,只要它们不相同,它们很容易以多个NaN值结束!试试这个:

Note that the requirement is only in one direction; objects that have the same hash do not have to be equal! At first I thought it's a typo, but then I realized that it's not. Hash collisions happen anyway, even with default __hash__() (see an excellent explanation here). The containers handle collisions without any problem. They do, of course, ultimately use the == operator to compare elements, hence they can easily end up with multiple values of NaN, as long as they are not identical! Try this:

>>> nan1 = float('nan')
>>> nan2 = float('nan')
>>> d = {}
>>> d[nan1] = 1
>>> d[nan2] = 2
>>> d[nan1]
1
>>> d[nan2]
2

因此,所有操作均如文档所述.但是...非常非常危险!有多少人知道NaN的多种价值可以在一个命令中彼此并存?有多少人会觉得这很容易调试?.

So everything works as documented. But... it's very very dangerous! How many people knew that multiple values of NaN could live alongside each other in a dict? How many people would find this easy to debug?..

我建议将NaN设置为float的子类的实例,该子类不支持哈希,因此不能意外地将其添加到set/dict中.我将其提交给python-ideas.

I would recommend to make NaN an instance of a subclass of float that doesn't support hashing and hence cannot be accidentally added to a set/dict. I'll submit this to python-ideas.

最后,我在文档此处:

对于未定义__contains__()但可以定义的用户定义类 定义__iter__(),如果带有x == z的某些值z为 在y上迭代时生成.如果在 迭代,就好像in引发了该异常.

For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.

最后,尝试使用旧式的迭代协议:如果一个类定义了 __getitem__()x in y为真且仅当存在非负数时 整数索引i,使得x == y[i]和所有较低的整数索引都可以 不会引发IndexError异常. (如果引发了其他任何异常, 就像in引发了该异常).

Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is true if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).

您可能会注意到,与内置容器不同,这里没有提及is.我对此感到惊讶,因此尝试了:

You may notice that there is no mention of is here, unlike with built-in containers. I was surprised by this, so I tried:

>>> nan1 = float('nan')
>>> nan2 = float('nan')
>>> class Cont:
...   def __iter__(self):
...     yield nan1
...
>>> c = Cont()
>>> nan1 in c
True
>>> nan2 in c
False

如您所见,首先在==之前检查身份-与内置容器一致.我将提交报告以修复文档.

As you can see, the identity is checked first, before == - consistent with the built-in containers. I'll submit a report to fix the docs.

这篇关于检查容器中是否存在NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆