np.fromregex,字符串为dtype [英] np.fromregex with string as dtype

查看:106
本文介绍了np.fromregex,字符串为dtype的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,其日期格式为"1:*?year mo da ho mi se.condsdec"(?"是1个字符的通配符),即:

I have a file with dates formatted as "1:*? year mo da ho mi se.condsdec", (with "?" being a 1 character wildcard) ie:

*A 2014 12 31 23 59 59.123456

我想将其提取为字符串(最终转换为日期时间字符串).

I would like to extract this either as strings (to eventually be converted to datetime strings).

我可以使用正则表达式模式将日期提取为一组int/floats:

I am able to extract the date as a set of int/floats using the regex pattern:

time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'

,但不能作为字符串.如何使用字符串使它工作?

but not as a string. How do I get this to work using a string?

我正在将python 3.4.3与numpy 1.9.3一起使用.

I am using python 3.4.3 with numpy 1.9.3.

import numpy as np
time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.int16),('month',np.int8),('day',np.int8),\
('hour',np.int8),('min',np.int8),('sec',np.float64)]
out=np.fromregex('filename',time_pattern,t_dtype)
print(out)
#returns [(2013, 11, 26, 0, 0, 10.0) (2013, 11, 26, 0, 0, 20.0)
# (2013, 11, 26, 0, 0, 30.0)]


basic_t=r'$\*.{2}(.{28})'
t_dtype=[('date',str)]
out=np.fromregex('filename',basic_t,t_dtype)
#causes TypeError: 
#TypeError: Empty data-type

使用文件filename:

*  2003 11 26 00 00 10.00000000  
some text or interesting data                      
*  2003 11 26 00 00 20.00000000
more text
even more text                         
*  2003 11 26 00 00 30.00000000    
etc.  

请注意,模式是简单的

with open(file) as f: 
   for line in f: 
      m=re.search(basic_t,line)

但是我想将输出作为一个numpy数组,并希望将运行时保持在最低限度.

But I would like to have the output as a numpy array, and would like to keep runtime to a minimum.

编辑 将dtype更改为'S'np.str可以消除错误,但是我仍然得到一个空列表作为输出

Edit Changing dtype to 'S' or np.str removes the error, but I still get an empty list as output

推荐答案

您的问题是,当您应将dtype指定为np.str_时,将dtype设置为int或float.您还需要指定字符串的长度,这样

Your problem is you are setting the dtype as int or float when you should be specifying them as np.str_. You also need to specify the length of the string so

import numpy as np

time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.str_,4),('month',np.str_,2),('day',np.str_,2),\
('hour',np.str_,2),('min',np.str_,2),('sec',np.str_,3)]

out=np.fromregex('filename',time_pattern,t_dtype)
print(out)

如果您查看的第二个示例,它显示了如何处理字符串

If you look at the second example of this, it shows how to handle strings

这篇关于np.fromregex,字符串为dtype的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆