Pandas DataFrame Read 跳过第 XXX 行:预期 X 字段,看到 Y [英] Pandas DataFrame Read Skipping line XXX: expected X fields, saw Y

查看:53
本文介绍了Pandas DataFrame Read 跳过第 XXX 行:预期 X 字段,看到 Y的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道我尝试加载的 csv 文件有什么问题:

I can't figure out what's wrong with the csv file I'm trying to load:

我收到如下错误消息:b'跳过第 2120260 行:预期 6 个字段,看到 8 个\n'

但是当我查看这些线条时,它们看起来还不错.见下文——(我将在每个选项卡 \t 后按 Enter 以使其更易于阅读).

But when I view the lines, they look ok to me. See below -- (I am going to press enter after each tab \t to make it easier to read).

第 2,120,260 行(失败):<代码>['user_000104\t2005-09-12T06:25:50Z\ta019a8cf-2601-4a81-b3c3-7b279a873713\t安妮克拉克\t8f8e6bc0-c3c0-4062-875a-773a1de6206f\t把我清空']

Line 2,120,260 (failing): ['user_000104\t 2005-09-12T06:25:50Z\t a019a8cf-2601-4a81-b3c3-7b279a873713\t Anne Clark\t 8f8e6bc0-c3c0-4062-875a-773a1de6206f\t Empty Me']

第 9,000 行(未失败):<代码>['user_000001\t2008-06-15T17:28:31Z\ta3031680-c359-458f-a641-70ccbaec6a74\t史蒂夫·赖希\t2991db42-3b19-4344-a340-605ac3fbd7e9\t击鼓:第四部分']

Line 9,000 (not failing): ['user_000001\t 2008-06-15T17:28:31Z\t a3031680-c359-458f-a641-70ccbaec6a74\t Steve Reich\t 2991db42-3b19-4344-a340-605ac3fbd7e9\t Drumming: Part Iv']

如果有人想亲自尝试,请下载:

If anyone wants to try it out for themselves, download this:

http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html

并运行: inpFile2 = pd.read_csv(fPath, sep='\t', error_bad_lines= False)

产生错误.和:<代码>def checkRow(path,N):使用 open(path, 'r') 作为 f:print("这是这条线.")打印(下一个(itertools.islice(csv.reader(f),N,无)))

查看错误行(传入文件路径和你感兴趣的行).确保导入 csv 并导入 itertools.

to view the error row (pass in the file path and the row you are interested in). Make sure you import csv and import itertools.

推荐答案

好的,我设法弄明白了.

Ok I manged to get the bottom of it.

解决方案是在 read_csv 命令中使用 quoting=csv.QUOTE_NONE 作为参数.inpFile = pd.read_csv(fPath, sep='\t', error_bad_lines= False,quoting=csv.QUOTE_NONE)

The solution is to use quoting=csv.QUOTE_NONE as a parameter in the read_csv command. inpFile = pd.read_csv(fPath, sep='\t', error_bad_lines= False,quoting=csv.QUOTE_NONE)

其原因是其中一个字段中存在双引号,这导致 Pandas 感到困惑,因此需要告诉它不要寻找字符串/引号.进行上述更改似乎已加载它.

And the reason for that is the existence of a double quote in one of the fields which is causing Pandas go get confused so need to tell it not to look out for strings/quotes. Making the above change seems to have loaded it.

这篇关于Pandas DataFrame Read 跳过第 XXX 行:预期 X 字段,看到 Y的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆