尽管存在UnicodeDecodeError,Python 3 itertools.islice仍继续 [英] Python 3 itertools.islice continue despite UnicodeDecodeError

查看:42
本文介绍了尽管存在UnicodeDecodeError,Python 3 itertools.islice仍继续的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个监视日志文件的python 3程序.该日志除其他外包括用户编写的聊天消息.该日志是由我无法更改的第三方应用程序创建的.

I have a python 3 program that monitors a log file. The log includes, among other things, chat messages written by users. The log is created by a third party application which I cannot change.

今天,用户写了텋 텋 ",它导致程序崩溃,并出现以下错误:

Today a user wrote "텋��텋��" and it caused the program to crash with the following error:

future: <Task finished coro=<updateConsoleLog() done, defined at /usr/local/src/bserver/logmonitor.py:48> exception=UnicodeDecodeError('utf-8',...
say "\xed\xa0\xbd\xed\xb1\x8c"\r\n', 7623, 7624, 'invalid continuation byte')>
Traceback (most recent call last):
File "/usr/lib/python3.4/asyncio/tasks.py", line 238, in _step
result = next(coro)
File "/usr/local/src/bserver/logmonitor.py", line 50, in updateConsoleLog
server_events = self.console.getUpdate()
File "/usr/local/src/bserver/console.py", line 79, in getUpdate
return self.read()
File "/usr/local/src/bserver/console.py", line 90, in read
for line in itertools.islice(log_file, log_no, None):
File "/usr/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7623: invalid continuation byte
ERROR:asyncio:Task exception was never retrieved

使用'file -i log.file',我确定该日志文件是us-ascii.这不应该是问题,因为ascii是utf-8的子集(据我所知).

Using 'file -i log.file' I determined that the log file is us-ascii. This shouldn't be and issue as ascii is a subset of utf-8 (as far as I know).

由于这种情况很少发生,而且我不介意丢失此用户键入的内容,因此我有可能忽略此行或无法解码的特定字符,而继续阅读其余内容吗?文件?

Since this happens rarely and I don't mind losing whatever it is that this user typed, is it possible for me to ignore this line or the particular characters that can't be decoded and just keep on reading the rest of the file?

我考虑使用try: ... except UnicodeDecodeError as ...,但这意味着错误发生后我无法读取日志文件中的任何内容.

I considered using try: ... except UnicodeDecodeError as ..., but this would mean I can't read anything in the log file after the error.

代码

def read(self):
    log_no = self.last_log_no
    log_file = open(self.path, 'r')
    server_events = []
    starting_log_no = log_no
    for line in itertools.islice(log_file, log_no, None): //ERROR
        server_events.append(line)
        print(line.replace('\n', '').replace('\r', ''))

        log_no += 1
        self.last_log_no = log_no
    if (starting_log_no < log_no):
        return server_events
    return False

任何帮助或建议将不胜感激!

Any help or advise would be appreciated!

推荐答案

字节字符串\xed\xa0\xbd\xed\xb1\x8c无效utf-8. us-ascii也不是,因为us-ascii只能是7位长.即\x8c大于127.

The byte string \xed\xa0\xbd\xed\xb1\x8c is not utf-8 valid. Neither is it us-ascii, since us-ascii can only be 7-bits long; i.e. \x8c is greater than 127.

而不是忽略UnicodeDecodeError,请尝试使用支持字节的所有8位(例如latin-1)的编码打开文件:

Instead of ignoring the UnicodeDecodeError, try opening the file with an encoding that supports all 8-bits of a byte (e.g. latin-1):

log_file = open(self.path, 'r' encoding='latin-1')

这篇关于尽管存在UnicodeDecodeError,Python 3 itertools.islice仍继续的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆