CrawlSpider with Splash 在第一个 URL 后卡住 [英] CrawlSpider with Splash getting stuck after first URL
本文介绍了CrawlSpider with Splash 在第一个 URL 后卡住的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在编写一个爬虫蜘蛛,我需要在其中渲染一些带有飞溅的响应.我的蜘蛛基于 CrawlSpider.我需要呈现我的 start_url 响应来喂养我的爬行蜘蛛.不幸的是,我的爬行蜘蛛在呈现第一个响应后停止了.知道出了什么问题吗?
I'm writing a scrapy spider where I need to render some of the responses with splash. My spider is based on CrawlSpider. I need to render my start_url responses to feed my crawl spider. Unfortunately my crawl spider stops after rendering of the first responds. Any idea what is going wrong?
class VideoSpider(CrawlSpider):
start_urls = ['https://juke.com/de/de/search?q=1+Mord+f%C3%BCr+2']
rules = (
Rule(LinkExtractor(allow=()), callback='parse_items',process_request = "use_splash",),
)
def use_splash(self, request):
request.meta['splash'] = {
'endpoint':'render.html',
'args':{
'wait':0.5,
}
}
return request
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': {'wait': 0.5}
}
})
def parse_items(self, response):
data = response.body
print(data)
推荐答案
使用 SplashRequest 而不是 scrapy.Request... 查看我的答案CrawlSpider with Splash
Use SplashRequest instead of scrapy.Request... Check out my answer CrawlSpider with Splash
这篇关于CrawlSpider with Splash 在第一个 URL 后卡住的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文