处理重音词时出现 UnicodeDecodeError [英] UnicodeDecodeError while processing Accented words

查看:69
本文介绍了处理重音词时出现 UnicodeDecodeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 python 脚本,它读取一个 YAML 文件(在嵌入式系统上运行).没有重音,脚本在我的开发机器和嵌入式系统上正常运行.但是带有重音的单词会使它崩溃

I have a python script which reads a YAML file (runs on an embedded system). Without accents, the script runs normally on my development machine and in the embedded system. But with accented words make it crash with

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

仅在嵌入式环境中.

YAML 示例:

data: ã

读取 YAML 的代码片段:

The snippet which reads the YAML:

with open(YAML_FILE, 'r') as stream:
  try:
    data = yaml.load(stream)

尝试了很多解决方案都没有成功.

Tried a bunch of solutions without success.

版本:Python 3.6、PyYAML 3.12

Versions: Python 3.6, PyYAML 3.12

推荐答案

读取字节的编解码器已设置为 ASCII.这将您限制为 0 到 127 之间的字节值.

The codec that is reading your bytes has been set to ASCII. This restricts you to byte values between 0 and 127.

Unicode 中重音字符的表示超出了此范围,因此您会收到解码错误.

The representation of accented characters in Unicode, comes outside this range, so you're getting a decoding error.

UTF-8 编解码器可以对 ASCII 和 UTF-8 进行解码,因为按照设计,ASCII 是 UTF-8 的(非常小的)子集.

A UTF-8 codec decodes ASCII as well as UTF-8, because ASCII is a (very small) subset of UTF-8, by design.

如果您可以将编解码器更改为 UTF-8 解码,它应该可以工作.

If you can change your codec to be a UTF-8 decode, it should work.

通常,您应该始终指定将字节流解码为文本的方式,否则,您的流可能不明确.

In general, you should always specify how you will decode a byte stream to text, otherwise, your stream could be ambiguous.

这篇关于处理重音词时出现 UnicodeDecodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆