如何忽略文件中的无效行? [英] How to ignore invalid lines in a file?

查看:181
本文介绍了如何忽略文件中的无效行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在遍历文件

for line in io.TextIOWrapper(readFile, encoding = 'utf8'):

文件包含以下行时

b'"""\xea\x11"\t1664\t507\t137\t2\n'

会产生以下异常

UnicodeDecodeError:"utf-8"编解码器无法解码位置中的字节0xea 3:无效的继续字节

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 3: invalid continuation byte

如何使我的脚本忽略这些行并继续执行好行?

How can I make my script to ignore such lines and continue with the good ones?

推荐答案

如果您实际上想忽略整行(如果有无效字符),则必须知道.这意味着您不能使用TextIOWrapper,而必须手动解码这些行.您想做的是这样:

If you actually want to ignore the whole line if it has any invalid characters, you will have to know there were invalid characters. Which means you can't use TextIOWrapper, and have to instead decode the lines manually. What you want to do is this:

for bline in readFile:
    try:
        line = bline.decode('utf-8')
    except UnicodeDecodeError:
        continue
    # do stuff with line

但是,请注意,这不会给您与使用文本文件相同的换行符行为.如果需要的话,也需要明确说明.

However, note that this does not give you the same newline behavior as using a text file; if you need that, you'll need to be explicit about that as well.

这篇关于如何忽略文件中的无效行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆