使用datetime.strptime转换器的numpy.genfromtxt [英] numpy.genfromtxt with datetime.strptime converter

查看:110
本文介绍了使用datetime.strptime转换器的numpy.genfromtxt的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据类似于要点中的数据,我试图用numpy提取数据。我对python很新,所以我尝试使用以下代码

I have data similar to that seen in this gist and I am trying to extract the data with numpy. I am rather new to python so I tried to do so with the following code

import numpy as np
from datetime import datetime

convertfunc = lambda x: datetime.strptime(x, '%H:%M:%S:.%f')
col_headers = ["Mass", "Thermocouple", "T O2 Sensor",\
               "Igniter", "Lamps", "O2", "Time"]
data = np.genfromtxt(files[1], skip_header=22,\
                     names=col_headers,\
                     converters={"Time": convertfunc})

在要点中可以看到有22行标题材料。在Ipython中,当我运行以下代码时,我收到一个以下结尾的错误:

Where as can be seen in the gist there are 22 rows of header material. In Ipython, when I "run" the following code I receive an error that ends with the following:

TypeError: float() argument must be a string or a number

可以看到完整的ipython错误跟踪这里

The full ipython error trace can be seen here.

我能够提取六列数字数据很好地使用genfromtxt的参数,比如usecols = range(0,6),但是当我尝试使用转换器来尝试解决最后一列时,我很难过。任何和所有评论将不胜感激!

I am able to extract the six columns of numeric data just fine using an argument to genfromtxt like usecols=range(0,6), but when I try to use a converter to try and tackle the last column I'm stumped. Any and all comments would be appreciated!

推荐答案

这是因为 np.genfromtxt 正在尝试创建一个float数组,该数组失败,因为 convertfunc 返回一个datetime对象,该对象不能转换为float。最简单的解决方案是将参数 dtype ='object'传递给 np.genfromtxt ,确保创建一个对象数组并阻止转换为float。但是,这意味着其他列将保存为字符串。要将它们正确保存为浮点数,您需要指定每个 dtype 以获得结构化数组。在这里,我将它们全部设置为加倍,除了最后一列,它将是一个对象dtype:

This is happening because np.genfromtxt is trying to create a float array, which fails because convertfunc returns a datetime object, which cannot be cast as float. The easiest solution would be to just pass the argument dtype='object' to np.genfromtxt, ensuring the creation of an object array and preventing a conversion to float. However, this would mean that the other columns would be saved as strings. To get them properly saved as floats you need to specify the dtype of each to get a structured array. Here I'm setting them all to double except the last column, which will be an object dtype:

dd = [(a, 'd') for a in col_headers[:-1]] + [(col_headers[-1], 'object')]
data = np.genfromtxt(files[1], skip_header=22, dtype=dd, 
                     names=col_headers, converters={'Time': convertfunc})

this将为您提供一个结构化数组,您可以使用您提供的名称访问:

This will give you a structured array which you can access with the names you gave:

In [74]: data['Mass']
Out[74]: array([ 0.262 ,  0.2618,  0.2616,  0.2614])
In [75]: data['Time']
Out[75]: array([1900-01-01 15:49:24.546000, 1900-01-01 15:49:25.171000,
                1900-01-01 15:49:25.405000, 1900-01-01 15:49:25.624000], 
                dtype=object)

这篇关于使用datetime.strptime转换器的numpy.genfromtxt的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆