如何在python2中将utf-8字节数组解码为字符串? [英] How can I decode a utf-8 byte array to a string in Python2?
问题描述
我有一个字节数组,代表一个utf-8编码的字符串.我想将这些字节解码回Pyton2中的字符串.我的整体程序依赖于Python2,因此无法切换至Python3.
I have an array of bytes representing a utf-8 encoded string. I want to decode these bytes back into the string in Pyton2. I am relying on Python2 for my overall program, so I can not switch to Python3.
array = [67, 97, 102, **-61, -87**, 32, 70, 108, 111, 114, 97]
-> Caf é植物区系
-> Café Flora
由于我想要的字符串中的每个字符不一定都由数组中的1个字节表示,所以我不能使用像这样的解决方案:
Since every character in the string I want is not necessarily represented by exactly 1 byte in the array, I can not use a solution like:
"".join(map(chr, array))
我试图创建一个可以遍历数组的函数,每当遇到一个不在0-127(ASCII)范围内的数字时,就创建一个新的16位int,将当前位向左移8位,然后使用按位或运算符添加以下字节.最后,它将使用unichr()对其进行解码.
I tried to create a function that would step through the array, and whenever it encounters a number not in the range 0-127 (ASCII), create a new 16 bit int, shift the current bits over 8 to the left, and then add the following byte using a bitwise OR. Finally it would use unichr() to decode it.
result = []
for i in range(len(byte_array)):
x = byte_array[i]
if x < 0:
b16 = x & 0xFFFF # 16 bit
b16 = b16 << 8
b16 = b16 | byte_array[i+1]
result.append(unichr(m16))
else:
result.append(chr(x))
return "".join(result)
但是,这没有成功.
以下文章很好地说明了这个问题,并包括一个nodeJS解决方案:
The following article explains the issue very well, and includes a nodeJS solution: