'ascii'编解码器在执行bs时无法解码字节0xcb [英] 'ascii' codec can't decode byte 0xcb while doing bs

查看:166
本文介绍了'ascii'编解码器在执行bs时无法解码字节0xcb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从Merriam-Webster的API在本地保存xml页面,让我给您提供以下网址: http://www .dictionaryapi.com/api/v1/references/collegiate/xml/apple?key = bf534d02-bf4e-49bc-b43f-37f68a0bf4fd

I save the xml page locally from an API of Merriam-Webster, let me give you the url: http://www.dictionaryapi.com/api/v1/references/collegiate/xml/apple?key=bf534d02-bf4e-49bc-b43f-37f68a0bf4fd

那是一个例子. 我从网址中进行网址检索并将其另存为xml文件.

That was an example. I urlretrieve it from the url and save it as a xml file.

现在我想打开它,但出现UnicodeDecodeError.

Now I want to open it but a UnicodeDecodeError occurs.

我做到了:

page = open('test.xml')
bs = BeautifulSoup(page)

然后发生以下错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb

我试图将网址u'test.xml'设置为无效.

I tried to make the url u'test.xml' it didn't work.

sys.getdefaultencoding() 'utf-8'

sys.getdefaultencoding() 'utf-8'

编码配置已经是utf-8,仍然无法解决问题,仍然感谢您的建议.

The encoding configuration is already utf-8, which doesn't solve the problem, thanks for the advice anyway.

推荐答案

您需要将编码指定为utf-8,即数据编码的方式,文件名与内部内容无关,因此以u为前缀制作unicode字符串将无济于事:

You need to specify the encoding as utf-8 which is what the data is encoded as, the filename has nothing to do with what is inside so prefixing with u to make a unicode string is not going to help:

import io
with io.open('test.xml', encoding="utf-8") as page:
      bs = BeautifulSoup(page)

这篇关于'ascii'编解码器在执行bs时无法解码字节0xcb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆