“非法多字节序列"指的是“非法多字节序列". Python 3时，BeautifulSoup中出现错误 [英] "illegal multibyte sequence" error from BeautifulSoup when Python 3

查看：755 发布时间：2020/5/25 1:17:08 python parsing web-scraping unicode beautifulsoup

本文介绍了“非法多字节序列"指的是“非法多字节序列". Python 3时，BeautifulSoup中出现错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

.html保存到本地磁盘，并且我正在使用BeautifulSoup(bs4)对其进行解析.

.html saved to local disk, and I am using BeautifulSoup (bs4) to parse it.

一切正常，直到最近将其更改为Python 3.

It worked all fine until lately it's changed to Python 3.

我在另一台机器Python 2中测试了相同的.html文件，它可以正常工作并返回页面内容.

I tested the same .html file in another machine Python 2, it works and returned the page contents.

soup = BeautifulSoup(open('page.html'), "lxml")

使用Python 3的机器不起作用，它说:

Machine with Python 3 doesn't work, and it says:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x92 in position 298670: illegal multibyte sequence

经过搜索，我尝试了以下操作，但均无济于事:(无论是'r'还是'rb'都没什么大不同)

Searched around and I tried below but neither worked: (be it 'r', or 'rb' doesn't make big difference)

soup = BeautifulSoup(open('page.html', 'r'), "lxml")
soup = BeautifulSoup(open('page.html', 'r'), 'html.parser')
soup = BeautifulSoup(open('page.html', 'r'), 'html5lib')
soup = BeautifulSoup(open('page.html', 'r'), 'xml')

如何使用Python 3解析此html页面?

How can I use Python 3 to parse this html page?

谢谢.

“非法多字节序列"指的是“非法多字节序列". Python 3时，BeautifulSoup中出现错误 [英] "illegal multibyte sequence" error from BeautifulSoup when Python 3

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

“非法多字节序列"指的是“非法多字节序列". Python 3时，BeautifulSoup中出现错误 [英] &quot;illegal multibyte sequence&quot; error from BeautifulSoup when Python 3

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

“非法多字节序列"指的是“非法多字节序列". Python 3时，BeautifulSoup中出现错误 [英] "illegal multibyte sequence" error from BeautifulSoup when Python 3

登录关闭