如何从python 3中的url中读取html [英] How to read html from a url in python 3
问题描述
在python 3.4中,我希望读取一个html页面作为字符串,给定url。
/ p>
在perl中,我用LWP :: Simple做了这个,使用get()。
matplotlib 1.3.1示例说: import urllib; U1 = urllib.urlretrieve(URL)
。
python3找不到 urlretrieve
。
我试过 u1 = urllib .request.urlopen(url)
,它似乎得到一个 HTTPResponse
对象,但我无法打印它或获取长度或索引它。
u1.body
不存在。我无法在python3中找到 HTTPResponse
的描述。
在 HTTPResponse
对象,它会给我的HTML页面的原始字节?
(其他问题的不相关的东西包括 urllib2
,它不存在于我的python,csv分析器等中)。
编辑:
我在之前的问题中发现了一些部分(主要)完成这项工作的问题:
u2 = urllib.request.urlopen('http://finance.yahoo.com/q?s=aapl&ql=1')
用于u2.readlines中的行):
print(行)
我说'部分',因为我不想阅读单独的行,但只是一个大字符串。
我可以将行连接起来,但打印的每行都有一个前缀为'b'的字符。
从哪里来?
再次,我想我可以删除冷杉t字符在连接之前,但这确实成为一个kloodge。
注意,Python3不读取html代码作为一个字符串,但是作为 bytearray
,所以你需要将它转换为 导入urllib.request
fp = urllib.request.urlopen(http://www.python.org)
mybytes = fp.read()
$ b $ mystr = mybytes.decode(utf8)
fp.close()
print(mystr)
I looked at previous similar questions and got only more confused.
In python 3.4, I want to read an html page as a string, given the url.
In perl I do this with LWP::Simple, using get().
A matplotlib 1.3.1 example says: import urllib; u1=urllib.urlretrieve(url)
.
python3 can't find urlretrieve
.
I tried u1 = urllib.request.urlopen(url)
, which appears to get an HTTPResponse
object, but I can't print it or get a length on it or index it.
u1.body
doesn't exist. I can't find a description of the HTTPResponse
in python3.
Is there an attribute in the HTTPResponse
object which will give me the raw bytes of the html page?
(Irrelevant stuff from other questions include urllib2
, which doesn't exist in my python, csv parsers, etc.)
Edit:
I found something in a prior question which partially (mostly) does the job:
u2 = urllib.request.urlopen('http://finance.yahoo.com/q?s=aapl&ql=1')
for lines in u2.readlines():
print (lines)
I say 'partially' because I don't want to read separate lines, but just one big string.
I could just concatenate the lines, but every line printed has a character 'b' prepended to it.
Where does that come from?
Again, I suppose I could delete the first character before concatenating, but that does get to be a kloodge.
Note that Python3 does not read the html code as a string but as a bytearray
, so you need to convert it to one with decode
.
import urllib.request
fp = urllib.request.urlopen("http://www.python.org")
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
print(mystr)
这篇关于如何从python 3中的url中读取html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!