为什么我不能将 \xDF (ß) 解码为 UTF-8? [英] Why can't I decode \xDF (ß) into UTF-8?
问题描述
我有一个字节串 b"\xDF"
.当我尝试将其解码为 UTF-8 时,会抛出 UnicodeDecodeError.解码为 CP1252 工作正常.在这两个字符集中,0xDF 由字符ß"表示.那么为什么会出现错误?
UTF-8 中的所有单字节编码字符都必须在 [0x00 .. 0x7F] (https://en.wikipedia.org/wiki/UTF-8).这些相当于 7 位 ASCII.
对于德语 ß
,你会得到 2 个 UTF-8 字节:
<块引用>
b'\xc3\x9f'
解码时也能正常工作:
b'\xc3\x9f'.decode("utf-8")
<块引用>
'ß'
I have a bytestring b"\xDF"
. When I try to decode it to UTF-8, a UnicodeDecodeError is thrown. Decoding to CP1252 works fine. In both charsets, 0xDF is represented by the character "ß". So why the Error?
>>> hex(ord("ß"))
'0xdf'
>>> b"\xDF".decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdf in position 0: unexpected end of data
>>> b"\xDF".decode("cp1252")
'ß'
All single-byte encoded characters in UTF-8 have to be in the range [0x00 .. 0x7F] (https://en.wikipedia.org/wiki/UTF-8). Those are equivalent to 7-bit ASCII.
For the german ß
, you'd get 2 bytes in UTF-8:
>>> "ß".encode("utf-8")
b'\xc3\x9f'
Which also works correctly when decoding:
b'\xc3\x9f'.decode("utf-8")
'ß'
这篇关于为什么我不能将 \xDF (ß) 解码为 UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!