python一次读取100行网站源代码 [英] python read lines of website source code 100 lines at a time
问题描述
我试图从一个网站上一次读取 100 行的源代码
I'm trying to read the source code from a website 100 lines at a time
例如:
self.code = urllib.request.urlopen(uri)
#Get 100 first lines
self.lines = self.getLines()
...
#Get 100 next lines
self.lines = self.getLines()
我的getLines代码是这样的:
My getLines code is like this:
def getLines(self):
res = []
i = 0
while i < 100:
res.append(str(self.code.readline()))
i+=1
return res
但问题在于 getLines()
总是返回代码的前 100 行.
But the problem is that getLines()
always returns the first 100 lines of the code.
我已经看到了一些使用 next()
或 tell()
和 seek()
的解决方案,但似乎这些函数是未在 HTTPResponse 类中实现.
I've seen some solutions with next()
or tell()
and seek()
, but it seems that those functions are not implemented in HTTPResponse class.
推荐答案
根据文档 urllib.request.urlopen(uri)
返回一个类似对象的文件,所以你应该可以这样做:
according to the documentation urllib.request.urlopen(uri)
returns a file like object, so you should be able to do:
from itertools import islice
def getLines(self)
res = []
for line in islice(self.code,100):
res.append(line)
return res
在 itertools 文档中有关于 islice
的更多信息.使用迭代器将避免 while
循环和手动增量.
there's more information on islice
in the itertools documentation. Using iterators will avoid the while
loop and manual increments.
如果你绝对必须使用readline()
,最好使用for
循环,即
If you absolutely must use readline()
, it's advisable to use a for
loop, i.e.
for i in xrange(100):
...
这篇关于python一次读取100行网站源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!