在Python中解码双重编码的utf8 [英] Decoding double encoded utf8 in Python
问题描述
我从一个客户通过xmlrpc获得的字符串有问题.他给我发送了两次编码的utf8字符串:(因此,当我在python中获取它们时,我有一个unicode对象,该对象必须再解码一次,但显然python不允许这样做.我注意到我的客户端但是我需要在他修复它之前先进行快速解决.
I've got a problem with strings that I get from one of my clients over xmlrpc. He sends me utf8 strings that are encoded twice :( so when I get them in python I have an unicode object that has to be decoded one more time, but obviously python doesn't allow that. I've noticed my client however I need to do quick workaround for now before he fixes it.
tcp转储中的原始字符串:
Raw string from tcp dump:
<string>Rafa\xc3\x85\xc2\x82</string>
这将转换为:
u'Rafa\xc5\x82'
我们得到的最好的是:
eval(repr(u'Rafa\xc5\x82')[1:]).decode("utf8")
这将导致正确的字符串为
This results in correct string which is:
u'Rafa\u0142'
但是,此方法很难看,不能在生产代码中使用. 如果有人知道如何以更合适的方式解决此问题,请写信. 谢谢, 克里斯
this works however is ugly as hell and cannot be used in production code. If anyone knows how to fix this problem in more suitable way please write. Thanks, Chris
推荐答案
>>> s = u'Rafa\xc5\x82'
>>> s.encode('raw_unicode_escape').decode('utf-8')
u'Rafa\u0142'
>>>
这篇关于在Python中解码双重编码的utf8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!