在Python中解码双重编码的utf8 [英] Decoding double encoded utf8 in Python

查看:146
本文介绍了在Python中解码双重编码的utf8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从一个客户通过xmlrpc获得的字符串有问题.他给我发送了两次编码的utf8字符串:(因此,当我在python中获取它们时,我有一个unicode对象,该对象必须再解码一次,但显然python不允许这样做.我注意到我的客户端但是我需要在他修复它之前先进行快速解决.

I've got a problem with strings that I get from one of my clients over xmlrpc. He sends me utf8 strings that are encoded twice :( so when I get them in python I have an unicode object that has to be decoded one more time, but obviously python doesn't allow that. I've noticed my client however I need to do quick workaround for now before he fixes it.

tcp转储中的原始字符串:

Raw string from tcp dump:

<string>Rafa\xc3\x85\xc2\x82</string>

这将转换为:

u'Rafa\xc5\x82'

我们得到的最好的是:

eval(repr(u'Rafa\xc5\x82')[1:]).decode("utf8") 

这将导致正确的字符串为

This results in correct string which is:

u'Rafa\u0142' 

但是,此方法很难看,不能在生产代码中使用. 如果有人知道如何以更合适的方式解决此问题,请写信. 谢谢, 克里斯

this works however is ugly as hell and cannot be used in production code. If anyone knows how to fix this problem in more suitable way please write. Thanks, Chris

推荐答案


>>> s = u'Rafa\xc5\x82'
>>> s.encode('raw_unicode_escape').decode('utf-8')
u'Rafa\u0142'
>>>

这篇关于在Python中解码双重编码的utf8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆