如何忽略文件中的无效行? [英] How to ignore invalid lines in a file?
问题描述
我正在遍历文件
for line in io.TextIOWrapper(readFile, encoding = 'utf8'):
文件包含以下行时
b'"""\xea\x11"\t1664\t507\t137\t2\n'
会产生以下异常
UnicodeDecodeError:"utf-8"编解码器无法解码位置中的字节0xea 3:无效的继续字节
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 3: invalid continuation byte
如何使我的脚本忽略这些行并继续执行好行?
How can I make my script to ignore such lines and continue with the good ones?
推荐答案
如果您实际上想忽略整行(如果有无效字符),则必须知道.这意味着您不能使用TextIOWrapper
,而必须手动解码这些行.您想做的是这样:
If you actually want to ignore the whole line if it has any invalid characters, you will have to know there were invalid characters. Which means you can't use TextIOWrapper
, and have to instead decode the lines manually. What you want to do is this:
for bline in readFile:
try:
line = bline.decode('utf-8')
except UnicodeDecodeError:
continue
# do stuff with line
但是,请注意,这不会给您与使用文本文件相同的换行符行为.如果需要的话,也需要明确说明.
However, note that this does not give you the same newline behavior as using a text file; if you need that, you'll need to be explicit about that as well.
这篇关于如何忽略文件中的无效行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!