读取.csv文件的值,并将其转换为float数组 [英] Read Values from .csv file and convert them to float arrays

查看:3795
本文介绍了读取.csv文件的值,并将其转换为float数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我偶然发现了一个小编码问题。我已经基本上从.csv文件,它看起来很像这个读取数据:

I stumbled upon a little coding problem. I have to basically read data from a .csv file which looks a lot like this:

2011-06-19 17:29:00.000,72,44,56,0.4772,0.3286,0.8497,31.3587,0.3235,0.9147,28.5751,0.3872,0.2803,0,0.2601,0.2073,0.1172,0,0.0,0,5.8922,1,0,0,0,1.2759

现在,我需要基本上由这种行整个文件,并将其解析为numpy的阵列。到目前为止,我已经能够使用类似于此code,让他们成为一个大字符串类型的对象:

Now, I need to basically an entire file consisting of rows like this and parse them into numpy arrays. Till now, I have been able to get them into a big string type object using code similar to this:

order_hist = np.loadtxt(filename_input,delimiter=',',dtype={'names': ('Year', 'Mon', 'Day', 'Stock', 'Action', 'Amount'), 'formats': ('i4', 'i4', 'i4', 'S10', 'S10', 'i4')})

此文件的格式由一组S20的数据类型为现在的。我需要基本上所有在大ORDER_HIST数据类型的数据提取到一组每列阵列。我不知道如何保存日期时间列(我已经把它作为字符串现在)。我需要休息浮动转换,但低于code是给我的错误:

The format for this file consists of a set of S20 data types as of now. I need to basically extract all of the data in the big ORDER_HIST data type into a set of arrays for each column. I do not know how to save the date time column (I've kept it as String for now). I need to convert the rest to float, but the below code is giving me an error:

    temparr=float[:len(order_hist)]
    for x in range(len(order_hist['Stock'])): 
        temparr[x]=float(order_hist['Stock'][x]);

有人能告诉我我怎么能全部列转换为我需要的阵列?或者可能直接我一些链接,这样做呢?

Can someone show me just how I can convert all the columns to the arrays that I need??? Or possibly direct me to some link to do so?

推荐答案

男孩,有我请客你。 numpy.genfromtxt 有一个转换器参数,它可以让你的文件进行解析指定一个函数为每列。该函数被送入该CSV字符串值。它的返回值将成为numpy的数组中的相应值。

Boy, have I got a treat for you. numpy.genfromtxt has a converters parameter, which allows you to specify a function for each column as the file is parsed. The function is fed the CSV string value. Its return value becomes the corresponding value in the numpy array.

Morever,在 DTYPE =无参数告诉 genfromtxt 来做出明智的猜测,每一列的类型。特别是,数字列自动转换为相应的DTYPE。

Morever, the dtype = None parameter tells genfromtxt to make an intelligent guess as to the type of each column. In particular, numeric columns are automatically cast to an appropriate dtype.

例如,假设您的数据文件包含

For example, suppose your data file contains

2011-06-19 17:29:00.000,72,44,56

然后

import numpy as np
import datetime as DT

def make_date(datestr):
    return DT.datetime.strptime(datestr, '%Y-%m-%d %H:%M:%S.%f')

arr = np.genfromtxt(filename, delimiter = ',',
                    converters = {'Date':make_date},
                    names =  ('Date', 'Stock', 'Action', 'Amount'),
                    dtype = None)
print(arr)
print(arr.dtype)

收益

(datetime.datetime(2011, 6, 19, 17, 29), 72, 44, 56)
[('Date', '|O4'), ('Stock', '<i4'), ('Action', '<i4'), ('Amount', '<i4')]

您真正的CSV文件的列多,所以你想要更多的项目添加到名称,但除此之外,这个例子还是应该站。

Your real csv file has more columns, so you'd want to add more items to names, but otherwise, the example should still stand.

如果你真的不关心额外的列,您可以指定一个绒毛的名字是这样的:

If you don't really care about the extra columns, you can assign a fluff-name like this:

arr = np.genfromtxt(filename, delimiter=',',
                    converters={'Date': make_date},
                    names=('Date', 'Stock', 'Action', 'Amount') +
                    tuple('col{i}'.format(i=i) for i in range(22)),
                    dtype = None)

收益

(datetime.datetime(2011, 6, 19, 17, 29), 72, 44, 56, 0.4772, 0.3286, 0.8497, 31.3587, 0.3235, 0.9147, 28.5751, 0.3872, 0.2803, 0, 0.2601, 0.2073, 0.1172, 0, 0.0, 0, 5.8922, 1, 0, 0, 0, 1.2759)


您可能也有兴趣在检查出大熊猫模块,是建立在<$ C $顶部C> numpy的,并呈解析CSV奢侈品的一个更高的水平:它有一个的 pandas.read_csv 函数,其 parse_dates = TRUE 参数将自动解析日期字符串(使用< A HREF =htt​​p://labix.org/python-dateutil#head-c0e81a473b647dfa787dc11e8c69557ec2c3ecd2相对=nofollow> dateutil )。


You might also be interested in checking out the pandas module which is built on top of numpy, and which takes parsing CSV to an even higher level of luxury: It has a pandas.read_csv function whose parse_dates = True parameter will automatically parse date strings (using dateutil).

使用熊猫,您的CSV可以与解析

Using pandas, your csv could be parsed with

df = pd.read_csv(filename, parse_dates = [0,1], header = None,
                    names=('Date', 'Stock', 'Action', 'Amount') +
                    tuple('col{i}'.format(i=i) for i in range(22)))

请注意,没有必要指定 make_date 函数。只要是明确的 - pands.read_csv 返回数据框,而不是一个numpy的数组。在数据帧实际上可能是你的目的更加有用,但是你应该知道它是一个不同的对象与方法,一个全新的世界开拓和探索。

Note there is no need to specify the make_date function. Just to be clear --pands.read_csvreturns aDataFrame, not a numpy array. The DataFrame may actually be more useful for your purpose, but you should be aware it is a different object with a whole new world of methods to exploit and explore.

这篇关于读取.csv文件的值,并将其转换为float数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆