UnicodeDecodeError: 'utf8' 编解码器无法解码字节 0x9c [英] UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c

查看:23
本文介绍了UnicodeDecodeError: 'utf8' 编解码器无法解码字节 0x9c的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个套接字服务器,它应该从客户端接收 UTF-8 有效字符.

I have a socket server that is supposed to receive UTF-8 valid characters from clients.

问题是一些客户端(主要是黑客)通过它发送了所有错误类型的数据.

The problem is some clients (mainly hackers) are sending all the wrong kind of data over it.

我可以轻松区分真正的客户端,但我将所有发送的数据记录到文件中,以便稍后进行分析.

I can easily distinguish the genuine client, but I am logging to files all the data sent so I can analyze it later.

有时我会收到这样的 œ 字符,导致 UnicodeDecodeError 错误.

Sometimes I get characters like this œ that cause the UnicodeDecodeError error.

我需要能够制作带有或不带有这些字符的字符串 UTF-8.

I need to be able to make the string UTF-8 with or without those characters.

更新:

对于我的特殊情况,套接字服务是 MTA,因此我只希望接收 ASCII 命令,例如:

For my particular case the socket service was an MTA and thus I only expect to receive ASCII commands such as:

EHLO example.com
MAIL FROM: <john.doe@example.com>
...

我在 JSON 中记录了所有这些.

I was logging all of this in JSON.

然后一些没有好意的人决定发送各种垃圾.

Then some folks out there without good intentions decided to send all kind of junk.

这就是为什么对于我的具体情况,去除非 ASCII 字符是完全可以的.

That is why for my specific case it is perfectly OK to strip the non ASCII characters.

推荐答案

http://docs.python.org/howto/unicode.html#the-unicode-type

str = unicode(str, errors='replace')

str = unicode(str, errors='ignore')

注意: 这将删除(忽略)有问题的字符,返回没有它们的字符串.

对我来说这是理想的情况,因为我使用它来保护我的应用程序不允许的非 ASCII 输入.

或者:使用 codecs 要读入文件的模块:

Alternatively: Use the open method from the codecs module to read in the file:

import codecs
with codecs.open(file_name, 'r', encoding='utf-8',
                 errors='ignore') as fdata:

这篇关于UnicodeDecodeError: 'utf8' 编解码器无法解码字节 0x9c的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆