pandas read_csv转换器–如何处理异常(literal_eval SyntaxError) [英] Pandas read_csv converter – How to handle exceptions (literal_eval SyntaxError)
问题描述
在Pandas DataFrame中,我正在读取一个如下的csv文件:
Into a Pandas DataFrame, I'm reading a csv file that looks like:
A B
+--------------+---------------+
0 | | ("t1", "t2") |
+--------------+---------------+
1 | ("t3", "t4") | |
+--------------+---------------+
其中两个单元格中有文字元组,而两个单元格为空.
Two of the cells have literal tuples in them, and two of the cells are empty.
df = pd.read_csv(my_file.csv, dtype=str, delimiter=',',
converters={'A': ast.literal_eval, 'B': ast.literal_eval})
转换器ast.literal_eval
可以很好地将文字元组转换为代码内的Python元组对象-但前提是没有空单元格.因为我的单元格为空,所以出现错误:
The converter ast.literal_eval
works fine to convert the literal tuples into Python tuple objects within the code – but only as long as there are no empty cells. Because I have empty cells, I get the error:
SyntaxError:解析时出现意外的EOF
SyntaxError: unexpected EOF while parsing
根据此 S/O答案,我应该尝试捕获空字符串的SyntaxError异常:
According to this S/O answer, I should try to catch the SyntaxError exception for empty strings:
ast使用compile来编译源字符串(该字符串必须是 表达式)转换成AST.如果源字符串无效 表达式(如空字符串),将引发SyntaxError 编译.
ast uses compile to compile the source string (which must be an expression) into an AST. If the source string is not a valid expression (like an empty string), a SyntaxError will be raised by compile.
但是,我不确定如何在read_csv
converters
的上下文中捕获单个单元格的异常.
However, I am not sure how to catch exceptions for individual cells, within the context of the read_csv
converters
.
解决这个问题的最佳方法是什么?是否有其他方法可以将空字符串/单元格转换为literal_eval
会接受或忽略的对象?
What would be the best way to go about this? Is there otherwise some way to convert empty strings/cells into objects which literal_eval
would accept or ignore?
NB:我的理解是,在可读文件中包含文字元组并不总是最好的选择,但对我而言,这很有用.
NB: My understanding is that having literal tuples in readable files isn't always the best thing, but in my case it's useful.
推荐答案
您可以创建有条件地使用ast.literal_eval
的自定义函数:
You can create a custom function which uses ast.literal_eval
conditionally:
from ast import literal_eval
from io import StringIO
# replicate csv file
x = StringIO("""A,B
,"('t1', 't2')"
"('t3', 't4')",""")
def literal_converter(val):
# replace first val with '' or some other null identifier if required
return val if val == '' else literal_eval(val)
df = pd.read_csv(x, delimiter=',', converters=dict.fromkeys('AB', literal_converter))
print(df)
A B
0 (t1, t2)
1 (t3, t4)
或者,您可以使用try
/except
捕获SyntaxError
.此解决方案更为宽容,因为它可以处理其他格式错误的语法,即ValueError
是由于 other 原因引起的,而不是空值.
Alternatively, you can use try
/ except
to catch SyntaxError
. This solution is more lenient as it will deal with other malformed syntax, i.e. SyntaxError
/ ValueError
caused by reasons other than empty values.
def literal_converter(val):
try:
return literal_eval(val)
except SyntaxError, ValueError:
return val
这篇关于 pandas read_csv转换器–如何处理异常(literal_eval SyntaxError)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!