使用numpy.genfromtxt填充缺失值 [英] Filling missing values using numpy.genfromtxt

查看:484
本文介绍了使用numpy.genfromtxt填充缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尽管有以上问题的建议:

Despite the advice from the previous questions:

-9999作为带有numpy.genfromtxt()的缺失值

使用genfromtxt使用以下命令导入csv数据numpy中缺少值

我仍然无法处理以缺少值结尾的文本文件

I still am unable to process a text file that ends with a missing value,

a.txt:

1 2 3
4 5 6
7 8

我尝试了missing_valuesfilling_values的多种选项安排,但无法使它起作用:

I've tried multiple arrangements of options of missing_values, filling_values and can not get this to work:

import numpy as np

sol = np.genfromtxt("a.txt", 
                    dtype=float,
                    invalid_raise=False, 
                    missing_values=None,
                    usemask=True,
                    filling_values=0.0)
print sol

我想得到的是:

[[1.0 2.0 3.0]
 [4.0 5.0 6.0]
 [7.0 8.0 0.0]]

但是我得到了:

/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py:1641: ConversionWarning: Some errors were detected !
    Line #3 (got 2 columns instead of 3)
  warnings.warn(errmsg, ConversionWarning)
[[1.0 2.0 3.0]
 [4.0 5.0 6.0]]

推荐答案

问题是numpy不喜欢粗糙的数组.由于文件最后一行的第三位没有字符,因此genfromtxt甚至不知道要解析什么,更不用说处理它了.如果缺失值包含填充符(任何填充符),例如:

The issue is that numpy doesn't like ragged arrays. Since there is no character in the third position of the last row of the file, so genfromtxt doesn't even know it's something to parse, let alone what to do with it. If the missing value had a filler (any filler) such as:

1 2 3
4 5 6
7 8 ''

那么您将能够:

sol = np.genfromtxt("a.txt",
                dtype=float,
                invalid_raise=False,
                missing_values='',
                usemask=False,
                filling_values=0.0)

和: 溶胶

array([[  1.,   2.,   3.],
       [  4.,   5.,   6.],
       [  7.,   8.,  nan]])

不幸的是,如果不能使文件的列统一,则可能会逐行进行解析.

Unfortunately, if making the columns of the file uniform isn't an option, you might be stuck with line-by-line parsing.

另一种可能性是,如果所有短"行都位于末尾……在这种情况下,您可以使用"usecols"标志来解析所有统一的列,然后使用skip_footer标志来对其余的列执行相同的操作,同时跳过不可用的那些列:

One other possibility would be IF all the "short" rows are at the end... in which case you might be able to utilize the 'usecols' flag to parse all columns that are uniform, and then the skip_footer flag to do the same for the remaining columns while skipping those that aren't available:

sol = np.genfromtxt("a.txt",
                dtype=float,
                invalid_raise=False,
                usemask=False,
                filling_values=0.0,
                usecols=(0,1))
sol
array([[ 1.,  2.],
   [ 4.,  5.],
   [ 7.,  8.]])

sol2 = np.genfromtxt("a.txt",
                dtype=float,
                invalid_raise=False,
                usemask=False,
                filling_values=0.0,
                usecols=(2,),
                skip_footer=1)
sol2
array([ 3.,  6.])

然后从那里组合数组,添加填充值:

And then combine the arrays from there adding the fill value:

sol2=np.append(sol2, 0.0)
sol2=sol2.reshape(3,1)
sol=np.hstack([sol,sol2])
sol
array([[ 1.,  2.,  3.],
   [ 4.,  5.,  6.],
   [ 7.,  8.,  0.]])

这篇关于使用numpy.genfromtxt填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆