scrapy 无法进行 Request() 回调 [英] scrapy unable to make Request() callback
本文介绍了scrapy 无法进行 Request() 回调的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用 Scrapy 制作递归解析脚本,但是 Request()
函数没有调用回调函数 suppose_to_parse()
,也没有在回调值中提供任何函数.我尝试了不同的变体,但它们都不起作用.去哪里挖?
I am trying to make recursive parsing script with Scrapy, but Request()
function doesn't call callback function suppose_to_parse()
, nor any function provided in callback value. I tried different variations but none of them work. Where to dig ?
from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class joomler(BaseSpider):
name = "scrapy"
allowed_domains = ["scrapy.org"]
start_urls = ["http://blog.scrapy.org/"]
def parse(self, response):
print "Working... "+response.url
hxs = HtmlXPathSelector(response)
for link in hxs.select('//a/@href').extract():
if not link.startswith('http://') and not link.startswith('#'):
url=""
url=(self.start_urls[0]+link).replace('//','/')
print url
yield Request(url, callback=self.suppose_to_parse)
def suppose_to_parse(self, response):
print "asdasd"
print response.url
推荐答案
将 yield 移到 if
语句之外:
Move the yield outside of the if
statement:
for link in hxs.select('//a/@href').extract():
url = link
if not link.startswith('http://') and not link.startswith('#'):
url = (self.start_urls[0] + link).replace('//','/')
print url
yield Request(url, callback=self.suppose_to_parse)
这篇关于scrapy 无法进行 Request() 回调的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文