使用Pandas读取制表符分隔文件 - 适用于Windows,但不适用于Mac [英] Reading tab-delimited file with Pandas - works on Windows, but not on Mac

查看:2144
本文介绍了使用Pandas读取制表符分隔文件 - 适用于Windows,但不适用于Mac的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用Pandas / Python在Windows中读取制表符分隔的数据文件而没有任何问题。数据文件包含前三行中的注释,然后是标题。

I've been reading a tab-delimited data file in Windows with Pandas/Python without any problems. The data file contains notes in first three lines and then follows with a header.

df = pd.read_csv(myfile,sep='\t',skiprows=(0,1,2),header=(0))

我现在正试图用我的Mac读取这个文件。 (我第一次在Mac上使用Python。)我收到以下错误。

I'm now trying to read this file with my Mac. (My first time using Python on Mac.) I get the following error.

pandas.parser.CParserError: Error tokenizing data. C error: Expected 1
fields in line 8, saw 39

如果设置 read_csv > False 的> error_bad_lines 参数,我得到以下信息,这些信息一直持续到最后一行结束。

If set the error_bad_lines argument for read_csv to False, I get the following information, which continues until the end of the last row.

Skipping line 8: expected 1 fields, saw 39
Skipping line 9: expected 1 fields, saw 125
Skipping line 10: expected 1 fields, saw 125
Skipping line 11: expected 1 fields, saw 125
Skipping line 12: expected 1 fields, saw 125
Skipping line 13: expected 1 fields, saw 125
Skipping line 14: expected 1 fields, saw 125
Skipping line 15: expected 1 fields, saw 125
Skipping line 16: expected 1 fields, saw 125
Skipping line 17: expected 1 fields, saw 125
...

我是否需要为指定一个值编码参数?好像我不应该这样,因为在Windows上读取文件工作正常。

Do I need to specify a value for the encoding argument? It seems as though I shouldn't have to because reading the file works fine on Windows.

推荐答案

最大的线索是行都在一条线上归还。这表示行终止符被忽略或不存在。

The biggest clue is the rows are all being returned on one line. This indicates line terminators are being ignored or are not present.

您可以为csv_reader指定行终止符。如果你在mac上,创建的行将以 \ r 而不是linux标准 \ n 结尾,或者更好的是,带有 \\\\ n 的窗户的吊带和腰带接近。

You can specify the line terminator for csv_reader. If you are on a mac the lines created will end with \rrather than the linux standard \n or better still the suspenders and belt approach of windows with \r\n.

pandas.read_csv(filename, sep='\t', lineterminator='\r')

您还可以使用编解码器包打开所有数据。这可能会增加稳健性,但会牺牲文档加载速度。

You could also open all your data using the codecs package. This may increase robustness at the expense of document loading speed.

import codecs

doc = codecs.open('document','rU','UTF-16') #open for reading with "universal" type set

df = pandas.read_csv(doc, sep='\t')

这篇关于使用Pandas读取制表符分隔文件 - 适用于Windows,但不适用于Mac的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆