Python NaN的集合和唯一性 [英] Python NaN's in set and uniqueness

查看:72
本文介绍了Python NaN的集合和唯一性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我偶然发现了这种有趣的Python行为,涉及到 set s中的 NaN :

I just stumbled across this interesting behavior of Python involving NaN's in sets:

# Test 1
nan = float('nan')
things = [0, 1, 2, nan, 'a', 1, nan, 'a', 2, nan, nan]
unique = set(things)
print(unique)  # {0, 1, 2, nan, 'a'}

# Test 2
things = [0, 1, 2, float('nan'), 'a', 1, float('nan'), 'a', 2, float('nan'), float('nan')]
unique = set(things)
print(unique)  # {0, 1, 2, nan, nan, nan, nan, 'a'}

在最后一个 set 中,同一键 nan 出现了多次,这似乎很奇怪.

That the same key nan shows up multiple times within the last set of course seems strange.

我认为这是由 nan 不等于自身引起的(由 float (例如小整数).

I believe this is caused by nan not being equal to itself (as defined by IEEE 754), together with the fact that objects are compared based on memory location (id()) prior to equality of values, when adding objects to a set. It then appears that each float('nan') results in a fresh object, rather than returning some global "singleton" float (as is done for e.g. the small integers).

  • In fact I just found this SO question describing the same behavior, seemingly confirming the above.

问题:

  1. 这真的是我们想要的行为吗?
  2. 说我从上面得到了第二个 things .如何计算实际上个唯一元素的数量?常见的 len(set(things))显然不起作用.我实际上可以使用 import numpy as np;len(np.unique(things)),但我想知道如果不使用第三方库就可以做到这一点.
  1. Is this really desired behavior?
  2. Say I was given the second things from above. How would I go about counting the number of actually unique elements? The usual len(set(things)) obviously does not work. I can in fact use import numpy as np; len(np.unique(things)), but I would like to know if this can be done without using third-party libraries.

附录

作为一个小的附录,我补充说, dict s也有类似的故事:

d = {float('nan'): 0, float('nan'): 1}
print(d)  # {nan: 0, nan: 1}

我的印象是, NaN 完全不能作为 dict s中的键,但是只要存储引用,它实际上就可以使用精确用作键的对象:

I was under the impression that NaN's were a total no-go as keys in dicts, but it does actually work out as long as you store references to the exact objects used as keys:

nan0 = float('nan')
nan1 = float('nan')
d = {nan0: 0, nan1: 1}
d[float('nan')]  # KeyError
d[nan0]  # 0
d[nan1]  # 1

确实很hacky,但是如果有人需要在现有的 dict 中存储其他值,并且不关心要使用哪个键,我可以看到这一技巧很有用.每个新密钥都不必已经在 dict 中.也就是说,可以将 float('nan')用作一种工厂,以生成无休止的新 dict 键,从而保证彼此之间,现有或将来都不会发生冲突键.

Surely hacky, but I can see this trick being useful if one is in need of storing additional values in an existing dict, and one does not care about which keys to use, except of course that each new key has to not be in the dict already. That is, one can use float('nan') as a factory for generating an unending supply of new dict keys, guaranteed to never collide with each other, existing or future keys.

推荐答案

float()的理想行为是返回float(类)的实例.而且,您是对的难"并不等于它本身.因此, float(1)== float(1) float('nan')!= float('nan')

The desired behavior of float() is to return an instance of float (class). and, you're right 'nan' is not equal to itself. Thus, float(1) == float(1) whereas float('nan') != float('nan')

要获得唯一的集合,我建议像在测试1中一样建立一个nan const.如果这不适合您,则可以使用 import math;math.isnan(float('nan')).遍历列表(或集合)并删除元素. newlist = [如果不是math.isnan(x),则x为x,如果不是math.isnan(x)]

To get a unique set I'd recommend establishing a nan const as you did in Test 1. If this won't fit for you, you could go with import math; math.isnan(float('nan')). Iterate over the list (or set) and remove the elements. newlist = [ x for x in things if not math.isnan(x) ]

您可能会想:不,我删除了所有Nan.如果以前有一个呢?

You might think: No I remove all nans. What is if there was one in before?

import math

things = [0, 1, 2, float('nan'), 'a', 1, float('nan'), 'a', 2, float('nan'), float('nan')]
nan = float('nan')
length = len(things)
newlist = [ x for x in things if not isinstance(x, str) and not math.isnan(x) ]
if len(newlist) != length:
    newlist.append(nan)  # or however you'd like to handle it
unique = set(newlist)
print(unique)

{0,1,2,nan}

这篇关于Python NaN的集合和唯一性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆