扭曲的 Python getPage [英] Twisted Python getPage
问题描述
我试图就此获得支持,但我完全困惑.
这是我的代码:
<前><代码>从twisted.internet 进口反应堆从twisted.web.client 导入getPage从twisted.web.error 导入错误从twisted.internet.defer 导入DeferredList从 sys 导入 argv类 GrabPage:def __init__(self, page):self.page = 页def start(self, *args):如果参数 == ():# 我们显然不需要为此进行身份验证d1 = getPage(self.page)别的:如果 len(args) == 2:# 我们有我们的登录信息d1 = getPage(self.page, headers={"Authorization": " ".join(args)})别的:raise Exception('缺少参数')d1.addCallback(self.pageCallback)dl = DeferredList([d1])d1.addErrback(self.errorHandler)dl.addCallback(self.listCallback)def errorHandler(self,result):#坏事!经过def pageCallback(self, result):返回结果def listCallback(self, result):打印结果a = GrabPage('http://www.google.com')data = a.start() # 不是 HTML我希望获取在调用 start() 时提供给 pageCallback 的 HTML.这对我来说是一个皮塔饼.泰!很抱歉我糟糕的编码.
您缺少有关 Twisted 运作方式的基础知识.这一切都围绕着 reactor
,你甚至从未运行过.把反应堆想象成这样:
(来源:krondo.com)
在您启动反应器之前,通过设置延迟,您所做的就是将它们链接起来,而不会触发任何事件.
我建议您通过以下方式提供 Twisted IntroDave Peticolas 阅读.它很快,而且确实为您提供了 Twisted 文档所没有的所有缺失信息.
总之,这里是最基础的getPage
使用示例:
fromtwisted.web.client import getPage从twisted.internet 进口反应堆url = 'http://aol.com'def print_and_stop(输出):打印输出如果reactor.running:反应器停止()如果 __name__ == '__main__':打印 'fetching', urld = 获取页面(网址)d.addCallback(print_and_stop)反应器运行()
由于 getPage
返回延迟,我将回调 print_and_stop
添加到延迟链.之后,我启动 reactor
.反应器触发 getPage
,然后触发 print_and_stop
打印来自 aol.com 的数据,然后停止反应器.
编辑以显示 OP 代码的工作示例:
类 GrabPage:def __init__(self, page):self.page = 页########### 我添加了这个:self.data = 无def start(self, *args):如果参数 == ():# 我们显然不需要为此进行身份验证d1 = getPage(self.page)别的:如果 len(args) == 2:# 我们有我们的登录信息d1 = getPage(self.page, headers={"Authorization": " ".join(args)})别的:raise Exception('缺少参数')d1.addCallback(self.pageCallback)dl = DeferredList([d1])d1.addErrback(self.errorHandler)dl.addCallback(self.listCallback)def errorHandler(self,result):#坏事!经过def pageCallback(self, result):########### 我添加了这个,以保存数据:self.data = 结果返回结果def listCallback(self, result):打印结果# 添加效果:如果reactor.running:反应器停止()a = GrabPage('http://google.com')########### 只需调用它而不分配给数据#data = a.start() # 不是 HTMLa.开始()########### 我添加了这个:如果不是 reactor.running:反应器运行()########### 引用类中的数据属性数据 = a.data打印 '------反应器停止------'打印########### a.data 的前 100 个字符:打印 '------a.data[:100]------'打印数据[:100]
I tried to get support on this but I am TOTALLY confused.
Here's my code:
from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.web.error import Error
from twisted.internet.defer import DeferredList
from sys import argv
class GrabPage:
def __init__(self, page):
self.page = page
def start(self, *args):
if args == ():
# We apparently don't need authentication for this
d1 = getPage(self.page)
else:
if len(args) == 2:
# We have our login information
d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
else:
raise Exception('Missing parameters')
d1.addCallback(self.pageCallback)
dl = DeferredList([d1])
d1.addErrback(self.errorHandler)
dl.addCallback(self.listCallback)
def errorHandler(self,result):
# Bad thingy!
pass
def pageCallback(self, result):
return result
def listCallback(self, result):
print result
a = GrabPage('http://www.google.com')
data = a.start() # Not the HTML
I wish to get the HTML out which is given to pageCallback when start() is called. This has been a pita for me. Ty! And sorry for my sucky coding.
You're missing the basics of how Twisted operates. It all revolves around the reactor
, which you're never even running. Think of the reactor like this:
(source: krondo.com)
Until you start the reactor, by setting up deferreds all you're doing is chaining them with no events from which to fire.
I recommend you give the Twisted Intro by Dave Peticolas a read. It's quick and it really gives you all the missing information that the Twisted documentation doesn't.
Anyways, here is the most basic usage example of getPage
as possible:
from twisted.web.client import getPage
from twisted.internet import reactor
url = 'http://aol.com'
def print_and_stop(output):
print output
if reactor.running:
reactor.stop()
if __name__ == '__main__':
print 'fetching', url
d = getPage(url)
d.addCallback(print_and_stop)
reactor.run()
Since getPage
returns a deferred, I'm adding the callback print_and_stop
to the deferred chain. After that, I start the reactor
. The reactor fires getPage
, which then fires print_and_stop
which prints the data from aol.com and then stops the reactor.
Edit to show a working example of OP's code:
class GrabPage:
def __init__(self, page):
self.page = page
########### I added this:
self.data = None
def start(self, *args):
if args == ():
# We apparently don't need authentication for this
d1 = getPage(self.page)
else:
if len(args) == 2:
# We have our login information
d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
else:
raise Exception('Missing parameters')
d1.addCallback(self.pageCallback)
dl = DeferredList([d1])
d1.addErrback(self.errorHandler)
dl.addCallback(self.listCallback)
def errorHandler(self,result):
# Bad thingy!
pass
def pageCallback(self, result):
########### I added this, to hold the data:
self.data = result
return result
def listCallback(self, result):
print result
# Added for effect:
if reactor.running:
reactor.stop()
a = GrabPage('http://google.com')
########### Just call it without assigning to data
#data = a.start() # Not the HTML
a.start()
########### I added this:
if not reactor.running:
reactor.run()
########### Reference the data attribute from the class
data = a.data
print '------REACTOR STOPPED------'
print
########### First 100 characters of a.data:
print '------a.data[:100]------'
print data[:100]
这篇关于扭曲的 Python getPage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!