处理重音词时出现 UnicodeDecodeError [英] UnicodeDecodeError while processing Accented words
问题描述
我有一个 python 脚本,它读取一个 YAML 文件(在嵌入式系统上运行).没有重音,脚本在我的开发机器和嵌入式系统上正常运行.但是带有重音的单词会使它崩溃
I have a python script which reads a YAML file (runs on an embedded system). Without accents, the script runs normally on my development machine and in the embedded system. But with accented words make it crash with
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)
仅在嵌入式环境中.
YAML 示例:
data: ã
读取 YAML 的代码片段:
The snippet which reads the YAML:
with open(YAML_FILE, 'r') as stream:
try:
data = yaml.load(stream)
尝试了很多解决方案都没有成功.
Tried a bunch of solutions without success.
版本:Python 3.6、PyYAML 3.12
Versions: Python 3.6, PyYAML 3.12
推荐答案
读取字节的编解码器已设置为 ASCII.这将您限制为 0 到 127 之间的字节值.
The codec that is reading your bytes has been set to ASCII. This restricts you to byte values between 0 and 127.
Unicode 中重音字符的表示超出了此范围,因此您会收到解码错误.
The representation of accented characters in Unicode, comes outside this range, so you're getting a decoding error.
UTF-8 编解码器可以对 ASCII 和 UTF-8 进行解码,因为按照设计,ASCII 是 UTF-8 的(非常小的)子集.
A UTF-8 codec decodes ASCII as well as UTF-8, because ASCII is a (very small) subset of UTF-8, by design.
如果您可以将编解码器更改为 UTF-8 解码,它应该可以工作.
If you can change your codec to be a UTF-8 decode, it should work.
通常,您应该始终指定将字节流解码为文本的方式,否则,您的流可能不明确.
In general, you should always specify how you will decode a byte stream to text, otherwise, your stream could be ambiguous.
这篇关于处理重音词时出现 UnicodeDecodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!