-9999为缺少值,带有numpy.genfromtxt() [英] -9999 as missing value with numpy.genfromtxt()

查看:250
本文介绍了-9999为缺少值,带有numpy.genfromtxt()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说我有一个哑巴的文本文件,内容如下:

Lets say I have a dumb text file with the contents:

Year    Recon   Observed
1505    162.38        23      
1506     46.14     -9999      
1507    147.49     -9999      

-9999用于表示缺失值(不要问).

-9999 is used to denote a missing value (don't ask).

因此,我应该能够使用以下命令将其读入Numpy数组:

So, I should be able to read this into a Numpy array with:

import numpy as np
x = np.genfromtxt("file.txt", dtype = None, names = True, missing_values = -9999)

让我所有的小-9999都变成numpy.nan.但是,我得到了:

And have all my little -9999s turn into numpy.nan. But, I get:

>>> x
array([(1409, 112.38, 23), (1410, 56.14, -9999), (1411, 145.49, -9999)], 
  dtype=[('Year', '<i8'), ('Recon', '<f8'), ('Observed', '<i8')])

...那是不对的...

... That's not right...

我想念什么吗?

推荐答案

不,您没有做错任何事情.使用missing_values自变量确实告诉np.genfromtxt相应的值应标记为丢失/无效".问题在于,仅当使用usemask=True参数时才支持处理缺失值(我可能应该在

Nope, you're not doing anything wrong. Using the missing_values argument indeed tells np.genfromtxt that the corresponding values should be flagged as "missing/invalid". The problem is that dealing with missing values is only supported if you use the usemask=True argument (I probably should have made that clearer in the documentation, my bad).

对于usemask=True,输出为掩码数组.您可以使用方法.filled(np.nan)将其转换为常规的ndarray,并将缺少的值替换为np.nan.

With usemask=True, the output is a masked array. You can transform it into a regular ndarray with the missing values replaced by np.nan with the method .filled(np.nan).

不过请小心:如果您的列被检测为具有int dtype,并且尝试用np.nan填充其缺失值,那么您将无法获得预期的结果(仅支持np.nan对于浮动列).

Be careful, though: if you have column that was detected as having a int dtype and you try to fill its missing values with np.nan, you won't get what you expect (np.nan is only supported for float columns).

这篇关于-9999为缺少值,带有numpy.genfromtxt()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆