NaN 和 None 有什么区别? [英] What is the difference between NaN and None?

查看:43
本文介绍了NaN 和 None 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 pandas readcsv() 读取 csv 文件的两列,然后将值分配给字典.列包含数字和字母字符串.偶尔会出现单元格为空的情况.在我看来,读取到该字典条目的值应该是 None 而是分配 nan .当然,None 更能描述空单元格,因为它有一个空值,而 nan 只是说读取的值不是数字.

I am reading two columns of a csv file using pandas readcsv() and then assigning the values to a dictionary. The columns contain strings of numbers and letters. Occasionally there are cases where a cell is empty. In my opinion, the value read to that dictionary entry should be None but instead nan is assigned. Surely None is more descriptive of an empty cell as it has a null value, whereas nan just says that the value read is not a number.

我的理解是否正确,Nonenan 有什么区别?为什么分配了 nan 而不是 None?

Is my understanding correct, what IS the difference between None and nan? Why is nan assigned instead of None?

此外,我的字典检查任何空单元格一直在使用 numpy.isnan():

Also, my dictionary check for any empty cells has been using numpy.isnan():

for k, v in my_dict.iteritems():
    if np.isnan(v):

但是这给了我一个错误,说我不能对 v 使用这个检查.我想这是因为要使用整数或浮点变量,而不是字符串.如果这是真的,我如何检查 v 是否存在空单元格"/nan 情况?

But this gives me an error saying that I cannot use this check for v. I guess it is because an integer or float variable, not a string is meant to be used. If this is true, how can I check v for an "empty cell"/nan case?

推荐答案

NaN 用作 缺失数据一致在pandas中,一致性很好.我通常将 NaN 阅读/翻译为 "missing".另见处理缺失数据"部分在文档中.

NaN is used as a placeholder for missing data consistently in pandas, consistency is good. I usually read/translate NaN as "missing". Also see the 'working with missing data' section in the docs.

Wes 在文档中写道 '选择NA-表示':

Wes writes in the docs 'choice of NA-representation':

经过多年的生产使用,[NaN] 已经证明,至少在我看来,考虑到 NumPy 和 Python 的总体情况,这是最好的决定.特殊值 NaN(Not-A-Number)无处不在用作 NA 值,并且有 API 函数 isnullnotnull 可用于跨数据类型检测 NA 值.
...
因此,我选择了 Pythonic 的实用性胜过纯度"的方法,并将整数 NA 的能力换成了一种更简单的方法,即在浮点和对象数组中使用特殊值来表示 NA,并在必须引入 NA 时将整数数组提升为浮点.

After years of production use [NaN] has proven, at least in my opinion, to be the best decision given the state of affairs in NumPy and Python in general. The special value NaN (Not-A-Number) is used everywhere as the NA value, and there are API functions isnull and notnull which can be used across the dtypes to detect NA values.
...
Thus, I have chosen the Pythonic "practicality beats purity" approach and traded integer NA capability for a much simpler approach of using a special value in float and object arrays to denote NA, and promoting integer arrays to floating when NAs must be introduced.

注意:"gotcha" 包含缺失数据的整数系列被向上转换为浮点数.

在我看来,使用 NaN(而不是 None)的主要原因是它可以用 numpy 的 float64 dtype 存储,而不是效率较低的对象 dtype,参见 NA 类型促销.

In my opinion the main reason to use NaN (over None) is that it can be stored with numpy's float64 dtype, rather than the less efficient object dtype, see NA type promotions.

#  without forcing dtype it changes None to NaN!
s_bad = pd.Series([1, None], dtype=object)
s_good = pd.Series([1, np.nan])

In [13]: s_bad.dtype
Out[13]: dtype('O')

In [14]: s_good.dtype
Out[14]: dtype('float64')

Jeff 对此评论(如下):

Jeff comments (below) on this:

np.nan 允许向量化操作;它是一个浮点值,而 None 根据定义,强制对象类型,这基本上禁用了 numpy 中的所有效率.

np.nan allows for vectorized operations; its a float value, while None, by definition, forces object type, which basically disables all efficiency in numpy.

所以快速重复 3 次:object==bad,float==good

也就是说,许多操作可能仍然适用于 None 与 NaN(但可能不受支持,即它们有时可能会给出 令人惊讶的结果):

Saying that, many operations may still work just as well with None vs NaN (but perhaps are not supported i.e. they may sometimes give surprising results):

In [15]: s_bad.sum()
Out[15]: 1

In [16]: s_good.sum()
Out[16]: 1.0

回答第二个问题:
您应该使用 pd.isnullpd.notnull 测试缺失数据(NaN).

To answer the second question:
You should be using pd.isnull and pd.notnull to test for missing data (NaN).

这篇关于NaN 和 None 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆