如何在具有数字字符串数字数字的 pandas 中读取自定义表格? [英] How to read the custom table in pandas which has number string number number?

查看:76
本文介绍了如何在具有数字字符串数字数字的 pandas 中读取自定义表格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图读取大熊猫中的自定义表格,但是很长时间以来却出现错误.

I have been trying to read a custom table in pandas but am getting errors for a long time.

Number string number number

  • 两个词之间只有一个空格
  • 一个单词是一个数字或只是一个英语单词
  • 没有NANS

文件名:station.tsv

filename: station.tsv

794 Kissee Mills MO 140 73 
824 Loma Mar CA 49 131 
603 Sandy Hook CT 72 148 
478 Tipton IN 34 98 
619 Arlington CO 75 93 
711 Turner AR 50 101 
839 Slidell LA 85 152 
411 Negreet LA 99 105 
588 Glencoe KY 46 136 
665 Chelsea IA 99 60
957 South El Monte CA 74 80


Note that the row `957 South El Monte CA 74 80` is  
actually 33rd row for my data.
If it was only 11th row, 
pandas gives no error, 
but if it is large nth row it gives error.

我的尝试

df = pd.read_csv('station.tsv', header=None, sep=' ')

ParserError: Error tokenizing data. 
C error: Expected 7 fields in line 33, saw 8

问题

是否可以使用某些正则表达式解析数据,例如:

Question

Is there a way to parse the data with some regex something like:

regexp = r'(\d+)\s+(\w+)\s+(\d+)\s+(\d+)'

要读取文本数据并根据它们创建一个数组.

To read the text data and make an array from them.

我希望为此使用NUMPY,PANDAS或任何其他python库.

I am expecting to use NUMPY, PANDAS or any other python library for this.

推荐答案

您可以指定一个定界符,该定界符是一个不带字母(?<![a-zA-Z])\s的空格,或者|一个后跟数字\s(?=\d)的空格.

You can specify a delimiter that is a space not preceded by a letter (?<![a-zA-Z])\s, or | a space that is followed by a number \s(?=\d).

sep = r'(?<![a-zA-Z])\s|\s(?=\d)'
df = pd.read_csv('station.tsv', engine='python', sep=sep, header=None)

      0                  1    2    3
0   794    Kissee Mills MO  140   73
1   824        Loma Mar CA   49  131
2   603      Sandy Hook CT   72  148
3   478          Tipton IN   34   98
4   619       Arlington CO   75   93
5   711          Turner AR   50  101
6   839         Slidell LA   85  152
7   411         Negreet LA   99  105
8   588         Glencoe KY   46  136
9   665         Chelsea IA   99   60
10  957  South El Monte CA   74   80

df.dtypes
#0     int64
#1    object
#2     int64
#3     int64

这篇关于如何在具有数字字符串数字数字的 pandas 中读取自定义表格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆