BeautifulSoup HTTPResponse没有属性编码 [英] BeautifulSoup HTTPResponse has no attribute encode
问题描述
我正在尝试使Beautifulsoup使用URL,如下所示:
I'm trying to get beautifulsoup working with a URL, like the following:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html.encode("utf-8"), "html.parser")
print(soup.find_all('a'))
但是,我遇到一个错误:
However, I am getting a error:
File "c:\Python3\ProxyList.py", line 3, in <module>
html = urlopen("http://proxies.org").encode("utf-8")
AttributeError: 'HTTPResponse' object has no attribute 'encode'
知道为什么吗?可能与urlopen函数有关吗?为什么需要utf-8?
Any idea why? Could it be to do with the urlopen function? Why is it needing the utf-8?
就给出的示例而言,显然Python 3和BeautifulSoup4之间存在一些差异(现在看来已经过时或错误)...
There clearly seems to be some differences with Python 3 and BeautifulSoup4, regarding the examples that are given (which seem to be out of date or wrong now)...
推荐答案
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all('a'))
- 首先,
urlopen
将返回一个类似文件的对象 -
BeautifulSoup
可以接受类似文件的对象并自动对其进行解码,您不必担心.
- First,
urlopen
will return a file-like object BeautifulSoup
can accept file-like object and decode it automatically, you should not worry about it.
文档:
要解析文档,请将其传递给BeautifulSoup构造函数. 您可以传入字符串或打开的文件句柄:
To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("index.html"))
soup = BeautifulSoup("<html>data</html>")
首先,将文档转换为Unicode,并将HTML实体转换为Unicode字符
这篇关于BeautifulSoup HTTPResponse没有属性编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!