使用 Pandas 读取制表符分隔的文件 - 适用于 Windows,但不适用于 Mac [英] Reading tab-delimited file with Pandas - works on Windows, but not on Mac

查看:20
本文介绍了使用 Pandas 读取制表符分隔的文件 - 适用于 Windows,但不适用于 Mac的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用 Pandas/Python 在 Windows 中读取制表符分隔的数据文件,没有任何问题.数据文件的前三行包含注释,然后是标题.

I've been reading a tab-delimited data file in Windows with Pandas/Python without any problems. The data file contains notes in first three lines and then follows with a header.

df = pd.read_csv(myfile,sep='	',skiprows=(0,1,2),header=(0))

我现在正在尝试用我的 Mac 读取这个文件.(我第一次在 Mac 上使用 Python.)我收到以下错误.

I'm now trying to read this file with my Mac. (My first time using Python on Mac.) I get the following error.

pandas.parser.CParserError: Error tokenizing data. C error: Expected 1
fields in line 8, saw 39

如果将 read_csverror_bad_lines 参数设置为 False,我会得到以下信息,这些信息会一直持续到最后一行的末尾.

If set the error_bad_lines argument for read_csv to False, I get the following information, which continues until the end of the last row.

Skipping line 8: expected 1 fields, saw 39
Skipping line 9: expected 1 fields, saw 125
Skipping line 10: expected 1 fields, saw 125
Skipping line 11: expected 1 fields, saw 125
Skipping line 12: expected 1 fields, saw 125
Skipping line 13: expected 1 fields, saw 125
Skipping line 14: expected 1 fields, saw 125
Skipping line 15: expected 1 fields, saw 125
Skipping line 16: expected 1 fields, saw 125
Skipping line 17: expected 1 fields, saw 125
...

我是否需要为 encoding 参数指定一个值?似乎我不应该这样做,因为在 Windows 上读取文件可以正常工作.

Do I need to specify a value for the encoding argument? It seems as though I shouldn't have to because reading the file works fine on Windows.

推荐答案

最大的线索是所有行都在一行上返回.这表明行终止符被忽略或不存在.

The biggest clue is the rows are all being returned on one line. This indicates line terminators are being ignored or are not present.

您可以为 csv_reader 指定行终止符.如果您使用的是 mac,则创建的行将以 而不是 linux 标准的 结尾,或者更好的是带有 .

You can specify the line terminator for csv_reader. If you are on a mac the lines created will end with rather than the linux standard or better still the suspenders and belt approach of windows with .

pandas.read_csv(filename, sep='	', lineterminator='
')

您还可以使用编解码器包打开所有数据.这可能会以牺牲文档加载速度为代价来提高鲁棒性.

You could also open all your data using the codecs package. This may increase robustness at the expense of document loading speed.

import codecs

doc = codecs.open('document','rU','UTF-16') #open for reading with "universal" type set

df = pandas.read_csv(doc, sep='	')

这篇关于使用 Pandas 读取制表符分隔的文件 - 适用于 Windows,但不适用于 Mac的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆