BeautifulSoup HTTPResponse没有属性编码 [英] BeautifulSoup HTTPResponse has no attribute encode

查看:106
本文介绍了BeautifulSoup HTTPResponse没有属性编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使Beautifulsoup使用URL,如下所示:

I'm trying to get beautifulsoup working with a URL, like the following:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html.encode("utf-8"), "html.parser")
print(soup.find_all('a'))

但是,我遇到一个错误:

However, I am getting a error:

 File "c:\Python3\ProxyList.py", line 3, in <module>
    html = urlopen("http://proxies.org").encode("utf-8")
AttributeError: 'HTTPResponse' object has no attribute 'encode'

知道为什么吗?可能与urlopen函数有关吗?为什么需要utf-8?

Any idea why? Could it be to do with the urlopen function? Why is it needing the utf-8?

就给出的示例而言,显然Python 3和BeautifulSoup4之间存在一些差异(现在看来已经过时或错误)...

There clearly seems to be some differences with Python 3 and BeautifulSoup4, regarding the examples that are given (which seem to be out of date or wrong now)...

推荐答案

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all('a'))

  1. 首先,urlopen将返回一个类似文件的对象
  2. BeautifulSoup可以接受类似文件的对象并自动对其进行解码,您不必担心.
  1. First, urlopen will return a file-like object
  2. BeautifulSoup can accept file-like object and decode it automatically, you should not worry about it.

文档:

要解析文档,请将其传递给BeautifulSoup构造函数. 您可以传入字符串或打开的文件句柄:

To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("index.html"))

soup = BeautifulSoup("<html>data</html>")

首先,将文档转换为Unicode,并将HTML实体转换为Unicode字符

这篇关于BeautifulSoup HTTPResponse没有属性编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆