Python:UnicodeDecodeError:'utf-8'编解码器无法解码位置0的字节0x80:无效的起始字节 [英] Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

查看:420
本文介绍了Python:UnicodeDecodeError:'utf-8'编解码器无法解码位置0的字节0x80:无效的起始字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从目录中获取数据,并且正在以字节格式提供数据.

I am fetching data from a catalog and it's giving data in bytes format.

字节数据:

b'\x80\x00\x00\x00\n\x00\x00%\x83\xa0\x08\x01\x00\xbb@\x00\x00\x05p 
\x02\x00>\xf3\x00\x00\x00}\x02\x00`\x03\xef0\x00\x00\r\xc0 
\x06\xf0>\xf3\x00\x00\x02\x88\x02\x03\xec\x03\xef0\x00\x00/.....'

在将数据转换为字符串或任何可读格式时,出现此错误:

While converting this data in string or any readable format I'am getting this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

我使用的代码(Python 3.7.3):

Code which I used(Python 3.7.3):

blobs = blob.decode('utf-8')

AND

import json
json.dumps(blob.decode())

我还使用了 pickle ast pprint ,但是它们在这里没有帮助.

I've also used pickle, ast and pprint but they are not helpful here.

我尝试过的事情:

推荐答案

UTF-8编码具有一些内置的冗余,至少可用于两个目的:

The UTF-8 encoding has some built-in redundancy that serves at least two purposes:

起始字节(携带实际数据的二进制点)与这4种模式之一匹配

Start bytes (in binary dots carrying actual data) match one of these 4 patterns

0.......
110.....
1110....
11110...

而连续字节(0到3)始终具有这种形式

whereas continuation bytes (0 to 3) have always this form

10......

2)检查有效性

如果不遵守此编码,可以肯定地说它不是UTF-8数据,例如因为在传输过程中发生了损坏.

2) checking for validity

If this encoding is not respected, it is safe to say that it is not UTF-8 data, e.g. because corruptions occurred during a transfer.

为什么可以说 b'\ x80 \'不能为UTF-8?已经在前两个字节违反了编码:因为80必须是连续字节.这完全是您的错误消息所说的:

Why is it possible to say that b'\x80\' cannot be UTF-8? Already at the first two bytes the encoding is violated: because 80 must be a continuation byte. This is exactly what your error message says:

UnicodeDecodeError:'utf-8'编解码器无法解码位置0:无效的起始字节中的字节0x80

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

即使您跳过了这个问题,以后在 b'%\ x83'上的另一个字节上也会遇到另一个问题,因此很可能是您尝试解码错误的数据或假设错误的编码.

And even if you skip this one, you get another problem some bytes later at b'%\x83', so it's most likely that either you are trying to decode the wrong data or assume the wrong encoding.

这篇关于Python:UnicodeDecodeError:'utf-8'编解码器无法解码位置0的字节0x80:无效的起始字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆