UnicodeDecodeError:'utf8'编解码器无法解码字节0x9c [英] UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c

查看:375
本文介绍了UnicodeDecodeError:'utf8'编解码器无法解码字节0x9c的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个套接字服务器,应该可以从客户端接收UTF-8有效字符.

I have a socket server that is supposed to receive UTF-8 valid characters from clients.

问题在于某些客户端(主要是黑客)正在通过它发送所有错误的数据.

The problem is some clients (mainly hackers) are sending all the wrong kind of data over it.

我可以轻松地区分真正的客户端,但是我会将所有发送的数据记录到文件中,以便以后进行分析.

I can easily distinguish the genuine client, but I am logging to files all the data sent so I can analyze it later.

有时我会收到诸如œ这样的字符,从而导致UnicodeDecodeError错误.

Sometimes I get characters like this œ that cause the UnicodeDecodeError error.

我需要能够使字符串UTF-8带有或不带有这些字符.

I need to be able to make the string UTF-8 with or without those characters.

更新:

对于我的特殊情况,套接字服务是MTA,因此我只希望接收ASCII命令,例如:

For my particular case the socket service was an MTA and thus I only expect to receive ASCII commands such as:

EHLO example.com
MAIL FROM: <john.doe@example.com>
...

我用JSON记录了所有这些内容.

I was logging all of this in JSON.

然后,一些没有好意的人决定出售各种垃圾.

Then some folks out there without good intentions decided to sell all kind of junk.

这就是为什么在我的特定情况下,完全可以剥离非ASCII字符.

That is why for my specific case it is perfectly OK to strip the non ASCII characters.

推荐答案

http://docs.python.org/howto/unicode.html#the-unicode-type

str = unicode(str, errors='replace')

str = unicode(str, errors='ignore')

注意: 这会删除(忽略)有问题的字符,并返回不包含这些字符的字符串.

对我来说这是理想的情况,因为我将其用作我的应用程序不允许的针对非ASCII输入的保护.

或者::使用 codecs 模块以读取文件:

Alternatively: Use the open method from the codecs module to read in the file:

import codecs
with codecs.open(file_name, 'r', encoding='utf-8',
                 errors='ignore') as fdata:

这篇关于UnicodeDecodeError:'utf8'编解码器无法解码字节0x9c的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆