python一次读取100行网站源代码 [英] python read lines of website source code 100 lines at a time

查看:58
本文介绍了python一次读取100行网站源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从一个网站上一次读取 100 行的源代码

I'm trying to read the source code from a website 100 lines at a time

例如:

self.code = urllib.request.urlopen(uri)

#Get 100 first lines
self.lines = self.getLines()

...

#Get 100 next lines
self.lines = self.getLines()

我的getLines代码是这样的:

My getLines code is like this:

def getLines(self):
    res = []
    i = 0

    while i < 100:
        res.append(str(self.code.readline()))
        i+=1

return res

但问题在于 getLines() 总是返回代码的前 100 行.

But the problem is that getLines() always returns the first 100 lines of the code.

我已经看到了一些使用 next()tell()seek() 的解决方案,但似乎这些函数是未在 HTTPResponse 类中实现.

I've seen some solutions with next() or tell() and seek(), but it seems that those functions are not implemented in HTTPResponse class.

推荐答案

根据文档 urllib.request.urlopen(uri) 返回一个类似对象的文件,所以你应该可以这样做:

according to the documentation urllib.request.urlopen(uri) returns a file like object, so you should be able to do:

from itertools import islice

def getLines(self)
    res = []
    for line in islice(self.code,100): 
        res.append(line)
    return res

itertools 文档中有关于 islice 的更多信息.使用迭代器将避免 while 循环和手动增量.

there's more information on islice in the itertools documentation. Using iterators will avoid the while loop and manual increments.

如果你绝对必须使用readline(),最好使用for循环,即

If you absolutely must use readline(), it's advisable to use a for loop, i.e.

for i in xrange(100): 
    ... 

这篇关于python一次读取100行网站源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆