CParserError:错误标记数据 [英] CParserError: Error tokenizing data
问题描述
我在阅读csv文件时遇到问题
I'm having some trouble reading a csv file
import pandas as pd
df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)
I get
pandas.io.common.CParserError:对数据进行标记化时出错。 C错误:第526行中的第一个字段,看到5
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5
code>到 df
我得到另一个错误
and when I add sep=None
to df
I get another error
错误: NULL字节
Error: line contains NULL byte
我尝试添加 unicode ='utf-8'
甚至尝试CSV阅读器,没有什么工作与这个文件
I tried adding unicode='utf-8'
, I even tried CSV reader and nothing works with this file
csv文件是完全正常,我检查它,我没有看到任何错误
the csv file is totally fine, I checked it and i see nothing wrong with it
这里是我得到的错误:
推荐答案
在您的实际代码中,行是:
In your actual code, the line is:
>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)
您正在尝试读取Excel
You are trying to read an Excel file, and not a plain text CSV which is why things are not working.
Excel文件(xlsx)采用特殊的二进制格式,不能读取为简单的文本文件(如CSV文件)。
Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).
您需要将Excel文件转换为CSV文件(注意 - 如果您有多个工作表, csv文件),然后阅读这些文件。
You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.
您可以使用 read_excel
,或者您可以使用 xlrd
其设计用于读取Excel文件的二进制格式;请参阅使用Python阅读/解析Excel(xls)文件 for更多有关这方面的信息。
You can use read_excel
or you can use a library like xlrd
which is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Python for for more information on that.
这篇关于CParserError:错误标记数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!