扭曲的 Python getPage [英] Twisted Python getPage

查看:27
本文介绍了扭曲的 Python getPage的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图就此获得支持,但我完全困惑.

这是我的代码:

<前><代码>从twisted.internet 进口反应堆从twisted.web.client 导入getPage从twisted.web.error 导入错误从twisted.internet.defer 导入DeferredList从 sys 导入 argv类 GrabPage:def __init__(self, page):self.page = 页def start(self, *args):如果参数 == ():# 我们显然不需要为此进行身份验证d1 = getPage(self.page)别的:如果 len(args) == 2:# 我们有我们的登录信息d1 = getPage(self.page, headers={"Authorization": " ".join(args)})别的:raise Exception('缺少参数')d1.addCallback(self.pageCallback)dl = DeferredList([d1])d1.addErrback(self.errorHandler)dl.addCallback(self.listCallback)def errorHandler(self,result):#坏事!经过def pageCallback(self, result):返回结果def listCallback(self, result):打印结果a = GrabPage('http://www.google.com')data = a.start() # 不是 HTML

我希望获取在调用 start() 时提供给 pageCallback 的 HTML.这对我来说是一个皮塔饼.泰!很抱歉我糟糕的编码.

解决方案

您缺少有关 Twisted 运作方式的基础知识.这一切都围绕着 reactor,你甚至从未运行过.把反应堆想象成这样:


(来源:krondo.com)

在您启动反应器之前,通过设置延迟,您所做的就是将它们链接起来,而不会触发任何事件.

我建议您通过以下方式提供 Twisted IntroDave Peticolas 阅读.它很快,而且确实为您提供了 Twisted 文档所没有的所有缺失信息.

总之,这里是最基础的getPage使用示例:

fromtwisted.web.client import getPage从twisted.internet 进口反应堆url = 'http://aol.com'def print_and_stop(输出):打印输出如果reactor.running:反应器停止()如果 __name__ == '__main__':打印 'fetching', urld = 获取页面(网址)d.addCallback(print_and_stop)反应器运行()

由于 getPage 返回延迟,我将回调 print_and_stop 添加到延迟链.之后,我启动 reactor.反应器触发 getPage,然后触发 print_and_stop 打印来自 aol.com 的数据,然后停止反应器.

编辑以显示 OP 代码的工作示例:

 类 GrabPage:def __init__(self, page):self.page = 页########### 我添加了这个:self.data = 无def start(self, *args):如果参数 == ():# 我们显然不需要为此进行身份验证d1 = getPage(self.page)别的:如果 len(args) == 2:# 我们有我们的登录信息d1 = getPage(self.page, headers={"Authorization": " ".join(args)})别的:raise Exception('缺少参数')d1.addCallback(self.pageCallback)dl = DeferredList([d1])d1.addErrback(self.errorHandler)dl.addCallback(self.listCallback)def errorHandler(self,result):#坏事!经过def pageCallback(self, result):########### 我添加了这个,以保存数据:self.data = 结果返回结果def listCallback(self, result):打印结果# 添加效果:如果reactor.running:反应器停止()a = GrabPage('http://google.com')########### 只需调用它而不分配给数据#data = a.start() # 不是 HTMLa.开始()########### 我添加了这个:如果不是 reactor.running:反应器运行()########### 引用类中的数据属性数据 = a.data打印 '------反应器停止------'打印########### a.data 的前 100 个字符:打印 '------a.data[:100]------'打印数据[:100]

I tried to get support on this but I am TOTALLY confused.

Here's my code:


from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.web.error import Error
from twisted.internet.defer import DeferredList
from sys import argv

class GrabPage:
 def __init__(self, page):
  self.page = page

 def start(self, *args):
  if args == ():
   # We apparently don't need authentication for this
   d1 = getPage(self.page)
  else:
   if len(args) == 2:
    # We have our login information
    d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
   else:
    raise Exception('Missing parameters')

  d1.addCallback(self.pageCallback)
  dl = DeferredList([d1])
  d1.addErrback(self.errorHandler)
  dl.addCallback(self.listCallback)

 def errorHandler(self,result):
  # Bad thingy!
  pass

 def pageCallback(self, result):
  return result

 def listCallback(self, result):
  print result

a = GrabPage('http://www.google.com')
data = a.start() # Not the HTML

I wish to get the HTML out which is given to pageCallback when start() is called. This has been a pita for me. Ty! And sorry for my sucky coding.

解决方案

You're missing the basics of how Twisted operates. It all revolves around the reactor, which you're never even running. Think of the reactor like this:


(source: krondo.com)

Until you start the reactor, by setting up deferreds all you're doing is chaining them with no events from which to fire.

I recommend you give the Twisted Intro by Dave Peticolas a read. It's quick and it really gives you all the missing information that the Twisted documentation doesn't.

Anyways, here is the most basic usage example of getPage as possible:

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

Since getPage returns a deferred, I'm adding the callback print_and_stop to the deferred chain. After that, I start the reactor. The reactor fires getPage, which then fires print_and_stop which prints the data from aol.com and then stops the reactor.

Edit to show a working example of OP's code:

class GrabPage:
    def __init__(self, page):
        self.page = page
        ########### I added this:
        self.data = None

    def start(self, *args):
        if args == ():
            # We apparently don't need authentication for this
            d1 = getPage(self.page)
        else:
            if len(args) == 2:
                # We have our login information
                d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
            else:
                raise Exception('Missing parameters')

        d1.addCallback(self.pageCallback)
        dl = DeferredList([d1])
        d1.addErrback(self.errorHandler)
        dl.addCallback(self.listCallback)

    def errorHandler(self,result):
        # Bad thingy!
        pass

    def pageCallback(self, result):
        ########### I added this, to hold the data:
        self.data = result
        return result

    def listCallback(self, result):
        print result
        # Added for effect:
        if reactor.running:
            reactor.stop()

a = GrabPage('http://google.com')
########### Just call it without assigning to data
#data = a.start() # Not the HTML
a.start()

########### I added this:
if not reactor.running:
    reactor.run()

########### Reference the data attribute from the class
data = a.data
print '------REACTOR STOPPED------'
print
########### First 100 characters of a.data:
print '------a.data[:100]------'
print data[:100] 

这篇关于扭曲的 Python getPage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆