使用numpy.loadtxt解析包含HH:MM:SS.mmm次的数据矩阵 [英] Parsing a data matrix containing HH:MM:SS.mmm times using numpy.loadtxt

查看:146
本文介绍了使用numpy.loadtxt解析包含HH:MM:SS.mmm次的数据矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我可以做类似的事情

I know I can do something like

numpy.loadtxt('data.txt', dtype={'names': ('time', 'magnitude'),
                                 'formats': ('S12', 'f8')})

但这给了我时间作为字符串.如何将其操纵为浮动?

but this gives me times as a string. How can I manipulate it into a float?

推荐答案

您可以使用 converter参数,以将函数应用于第一列中的每个字符串.每行调用一次Python函数可能会大大降低np.loadtxt的速度,但这对于中等大小的文件来说仍然是可行的解决方案:

You could use the converter parameter to apply a function to each string in the first column. Calling a Python function once for each row may slow down np.loadtxt considerably, but this might still be a workable solution for moderate-sized files:

import numpy as np

def parse_date(datestr):
    return sum([multiplier*val for multiplier, val in
                zip((3600, 60, 1), map(float, datestr.split(':')))])


x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'), 'formats': ('f8', 'f8')},
               converters={0:parse_date})
print(x)


或者,您可以像下面这样使用loadtxt将字符串解析为浮点数:


Alternatively, you could parse the strings into floats after using loadtxt like this:

x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'), 'formats': ('S12', 'f8')})
arr = np.char.split(x['time'], ':')
# http://stackoverflow.com/a/19459439/190597 (Jaime)
newarr = np.fromiter((tuple(row) for row in arr), dtype=[('', np.float)]*3,
                     count=len(arr)).view('float').reshape(-1, 3)
times = (newarr * [3600,60,1]).sum(axis=1)

y = np.empty_like(x, dtype={'names': ('time', 'magnitude'), 'formats': ('f8', 'f8')})
y['time'] = times
y['magnitude'] = x['magnitude']
print(y)


我创建了一个10 ** 6行的测试文件,以测试哪种方法更快.第二种方法更快:


I created a test files of 10**6 lines to test which method is faster. The second method is a bit faster:

In [329]: %timeit using_fromiter()
1 loops, best of 3: 5.59 s per loop


In [328]: %timeit using_converter()
1 loops, best of 3: 6.88 s per loop


import os
import numpy as np

def create_data(N):
    data = np.random.random(size=N)*86400
    hours, remainder = data.__divmod__(3600)
    minutes, seconds = remainder.__divmod__(60)
    mag = np.arange(N)
    filename = os.path.expanduser('~/tmp/data')
    with open(filename, 'w') as f:
        for h,m,s,a in np.column_stack([hours, minutes, seconds, mag]):
            f.write('{h:d}:{m:d}:{s:.6f} {a}\n'.format(h=int(h), m=int(m), s=s, a=a))

def parse_date(datestr):
    return sum([multiplier*val for multiplier, val in
                zip((3600, 60, 1), map(float, datestr.split(':')))])

def using_converter():
    x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'),
                                  'formats': ('f8', 'f8')},
                   converters={0:parse_date})
    return x

def using_fromiter():
    x = np.loadtxt('data', dtype={'names': ('time', 'magnitude'), 'formats': ('S12', 'f8')})
    arr = np.char.split(x['time'], ':')
    newarr = np.fromiter((tuple(row) for row in arr), dtype=[('', np.float)]*3,
                         count=len(arr)).view('float').reshape(-1, 3)
    times = (newarr * [3600,60,1]).sum(axis=1)

    y = np.empty_like(x, dtype={'names': ('time', 'magnitude'), 'formats': ('f8', 'f8')})
    y['time'] = times
    y['magnitude'] = x['magnitude']
    return y

create_data(10**6)

这篇关于使用numpy.loadtxt解析包含HH:MM:SS.mmm次的数据矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆