如何在python2中将utf-8字节数组解码为字符串? [英] How can I decode a utf-8 byte array to a string in Python2?

查看:171
本文介绍了如何在python2中将utf-8字节数组解码为字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字节数组,代表一个utf-8编码的字符串.我想将这些字节解码回Pyton2中的字符串.我的整体程序依赖于Python2,因此无法切换至Python3.

I have an array of bytes representing a utf-8 encoded string. I want to decode these bytes back into the string in Pyton2. I am relying on Python2 for my overall program, so I can not switch to Python3.

array = [67, 97, 102, **-61, -87**, 32, 70, 108, 111, 114, 97] 

-> Caf é植物区系

-> Café Flora

由于我想要的字符串中的每个字符不一定都由数组中的1个字节表示,所以我不能使用像这样的解决方案:

Since every character in the string I want is not necessarily represented by exactly 1 byte in the array, I can not use a solution like:

"".join(map(chr, array))

我试图创建一个可以遍历数组的函数,每当遇到一个不在0-127(ASCII)范围内的数字时,就创建一个新的16位int,将当前位向左移8位,然后使用按位或运算符添加以下字节.最后,它将使用unichr()对其进行解码.

I tried to create a function that would step through the array, and whenever it encounters a number not in the range 0-127 (ASCII), create a new 16 bit int, shift the current bits over 8 to the left, and then add the following byte using a bitwise OR. Finally it would use unichr() to decode it.

result = []


for i in range(len(byte_array)):
    x = byte_array[i]
    if x < 0:
        b16 = x & 0xFFFF # 16 bit
        b16 = b16 << 8
        b16 = b16 | byte_array[i+1]
        result.append(unichr(m16))
    else:
        result.append(chr(x))

return "".join(result)

但是,这没有成功.

以下文章很好地说明了这个问题,并包括一个nodeJS解决方案:

The following article explains the issue very well, and includes a nodeJS solution:

推荐答案

您可以为此使用struct.pack

>>> a =  [67, 97, 102, -61, -87, 32, 70, 108, 111, 114, 97]
>>> struct.pack("b"*len(a),*a)
'Caf\xc3\xa9 Flora'
>>> print struct.pack("b"*len(a),*a).decode('utf8')
Café Flora

这篇关于如何在python2中将utf-8字节数组解码为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆