读取CSV文件时如何捕获`CParserError` [英] How to catch `CParserError` when reading a CSV file

查看:47
本文介绍了读取CSV文件时如何捕获`CParserError`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 CSV 列表读入数据框中.但是,当文件具有与数据本身不匹配的标题行(即元数据或其他空白行)时,我无法捕捉到错误.此错误是CParserError"(请参阅​​底部的错误消息).

I want to read a list of CSVs into a dataframe. However, I'm having trouble catching an error that occurs when the file has header rows that do not match the data itself (i.e. metadata or additional blank rows). This error is a 'CParserError' (see my error messages at the bottom).

我目前的解决方案是使用 try-except 语句,与

My current solution is to use a try-except statement, with

try:
    #read file
except CParserError:
    #give me an error message

但是,这失败并出现以下错误:

However, this fails with the below error:

NameError: name 'CParserError' is not defined

我的代码如下.如您所见,我认为我需要多个 except 语句来捕获各种错误.第一个应该检查默认编码类型是否有效(文件永远不会是 utf-8 或 latin-1 以外的任何东西).如果有标题行,pd.read_csv 会给出我需要捕获的CParserError"消息(见下文).然后,如果有任何其他杂项问题,我也想解决这些问题.

My code is below. As you can see I think I require multiple except statements to catch the various errors. The first should check that the default encoding types work (the files will never be anything other than utf-8 or latin-1). If there are header rows, pd.read_csv gives a 'CParserError' message (see below) which I need to catch. Then, if there are any other miscellaneous issues I want to catch those too.

欢迎任何解决方案,理想情况下可以解释为什么 CParserError 不正确,或者是否可以修改 try-except 逻辑以避免依赖于此.

Any solutions welcome, that ideally would explain why CParserError isn't right, or if the try-except logic could be amended to avoid the reliance on this.

谢谢.

files_list = glob.glob('*.csv*')     #get all csvs
files_dict = {}           
for file in files_list:
    try:
        files_dict[file] = pd.read_csv('DFA_me_week27.csv', encoding='utf-8').read() 
    except UnicodeDecodeError:    
        files_dict[file] = pd.read_csv('DFA_me_week27.csv', encoding='Latin-1').read()
    except CParserError:
        print(file, 'failed: check for header rows')
    except:
        print(file, 'failed: some other error occurred')

尝试解析带有标题的 CSV 文件时的错误消息:

The error message when trying to parse a CSV file with headers:

CParserError                              Traceback (most recent call last)
<ipython-input-15-e454c053d675> in <module>()
----> 1 pd.read_csv('DFA_me_week27.csv')

C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    463                     skip_blank_lines=skip_blank_lines)
    464 
--> 465         return _read(filepath_or_buffer, kwds)
    466 
    467     parser_f.__name__ = name

C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    249         return parser
    250 
--> 251     return parser.read()
    252 
    253 _parser_defaults = {

C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
    708                 raise ValueError('skip_footer not supported for iteration')
    709 
--> 710         ret = self._engine.read(nrows)
    711 
    712         if self.options.get('as_recarray'):

C:\Users\john.lwli\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1157 
   1158         try:
-> 1159             data = self._reader.read(nrows)
   1160         except StopIteration:
   1161             if nrows is None:

pandas\parser.pyx in pandas.parser.TextReader.read (pandas\parser.c:7403)()

pandas\parser.pyx in pandas.parser.TextReader._read_low_memory (pandas\parser.c:7643)()

pandas\parser.pyx in pandas.parser.TextReader._read_rows (pandas\parser.c:8260)()

pandas\parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:8134)()

pandas\parser.pyx in pandas.parser.raise_parser_error (pandas\parser.c:20720)()

CParserError: Error tokenizing data. C error: Expected 2 fields in line 12, saw 12

推荐答案

我不想陈述显而易见的事情,但是...

I hate to state the obvious, but...

from pandas.parser import CParserError

<小时>

FutureWarning:pandas.parser 模块已弃用,将在未来版本中删除.请改用以下内容


FutureWarning: The pandas.parser module is deprecated and will be removed in a future version. Please use the following instead

import from the pandas.io.parser

这篇关于读取CSV文件时如何捕获`CParserError`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆