如何使用BeautifulSoup解析具有非ASCII字符的HTML? [英] How to Parse HTML with Non-ASCII Characters using BeautifulSoup?

查看：83 发布时间：2020/9/20 7:36:01 python beautifulsoup

本文介绍了如何使用BeautifulSoup解析具有非ASCII字符的HTML?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试使用BeautifulSoup解析某些html时，我不断收到以下错误:

I keep getting the following error when trying to parse some html using BeautifulSoup:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128)

我尝试使用以下问题的解决方案来解码html，但始终会遇到相同的错误.我已经尝试了以下所有问题的解决方案，但都无法解决(发布信息，以便避免重复的答案，以防万一他们通过查看问题的相关方法来帮助任何人找到解决方案).

I've tried decoding the html using the solution to the questions below, but keep getting the same error. I've tried all the solutions to the questions below but none of them work (posting so that I don't get duplicate answers and in case they help anyone to find a solution by viewing related approaches to the problem).

有人知道我在哪里错吗?这是BeautifulSoup中的错误，我应该安装早期版本吗?

Anybody know where I'm going wrong here? Is this a bug in BeautifulSoup and should I install an earlier version?

下面的代码和回溯:

from BeautifulSoup import BeautifulSoup as bs
soup = bs(html)

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1282, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 946, in __init__
    self._feed()
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 971, in _feed
    SGMLParser.feed(self, markup)
  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
    self.goahead(0)
  File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.5/sgmllib.py", line 285, in parse_starttag
    self._convert_ref, attrvalue)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128)

以下每个评论的错误消息:

error message per comment below:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1282, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 946, in __init__
    self._feed()
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 971, in _feed
    SGMLParser.feed(self, markup)
  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
    self.goahead(0)
  File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.5/sgmllib.py", line 285, in parse_starttag
    self._convert_ref, attrvalue)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128)

感谢您的帮助！

beautifulsoup中的"ascii"编解码器错误

UnicodeEncodeError:"ascii"编解码器无法在位置0处编码字符u'\ xef':序数不在范围(128)中

我如何使用Python将文件的格式从Unicode转换为ASCII?

python UnicodeEncodeError>我该如何简单地删除令人烦恼的Unicode字符?

UnicodeEncodeError:"ascii"编解码器无法在位置0处编码字符u'\ xef':序数不在范围(128)中