pandas.errors.ParserError:','应在'"'之后 [英] pandas.errors.ParserError: ',' expected after '"'

查看:941
本文介绍了pandas.errors.ParserError:','应在'"'之后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Kaggle读取此数据集:印刷和点燃书籍的亚马逊销售排名数据

I am trying to read this dataset from Kaggle: Amazon sales rank data for print and kindle books

文件amazon_com_extras.csv 有一个名为标题"的列,该列有时包含一个逗号,"因此该.csv中的所有字段都用引号引起来:

The file amazon_com_extras.csv has a column named "Title" that sometimes contains a comma ',' so all the fields in this .csv are enclosed by quotation marks:

"ASIN","GROUP","FORMAT","TITLE","AUTHOR","PUBLISHER"
"022640014X","book","hardcover","The Diversity Bargain: And Other Dilemmas of Race, Admissions, and Meritocracy at Elite Universities","Natasha K. Warikoo","University Of Chicago Press"

我已经阅读了与此问题有关的其他问题,但没有一个问题可以解决.例如,我尝试过:

I have read other questions related to this problem but none of them solve it. For example, I have tried:

df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',')
df = pd.read_csv("amazon_com_extras.csv",engine="python",sep=',',quotechar='"')

但是似乎没有任何效果. 我正在使用Python 3.7.2和pandas 0.24.1.

But nothing seems to work. I am using Python 3.7.2 and pandas 0.24.1.

推荐答案

发生这种情况是因为文档中的字段在加引号的文本中包含未转义的引号.

This is happening to you because there are fields inside the document that contain unescaped quotes inside the quoted text.

我不知道一种指示csv解析器在不进行预处理的情况下进行处理的方法.

I am not aware of a way to instruct the csv parser to handle that without preprocessing.

如果您不关心这些列,则可以使用

If you don't care about those columns, you can use

pd.read_csv("amazon_com_extras.csv", engine="python", sep=',', quotechar='"', error_bad_lines=False)

这将禁止引发Exception,但是它将删除受影响的行(您将在控制台中看到该行).

That will disable the Exception from being raised, but it will remove the affected lines (you will see that in the console).

此类行的示例:

"1405246510","book","hardcover",""Hannah Montana" Annual 2010","Unknown","Egmont Books Ltd"

注意报价.

相反,更标准的csv方言将呈现:

Instead, a more standard dialect of csv would have rendered:

1405246510,"book","hardcover","""Hannah Montana"" Annual 2010","Unknown","Egmont Books Ltd"

例如,您可以使用Libreoffice加载文件,然后再次将其重新保存为CSV,以获得可用的CSV方言或使用其他预处理技术.

You can, for example, load the file with Libreoffice and re-save it as CSV again to get a working CSV dialect or use other preprocessing techniques.

这篇关于pandas.errors.ParserError:','应在'"'之后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆