CrawlSpider with Splash 在第一个 URL 后卡住 [英] CrawlSpider with Splash getting stuck after first URL

查看:29
本文介绍了CrawlSpider with Splash 在第一个 URL 后卡住的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个爬虫蜘蛛,我需要在其中渲染一些带有飞溅的响应.我的蜘蛛基于 CrawlSpider.我需要呈现我的 start_url 响应来喂养我的爬行蜘蛛.不幸的是,我的爬行蜘蛛在呈现第一个响应后停止了.知道出了什么问题吗?

I'm writing a scrapy spider where I need to render some of the responses with splash. My spider is based on CrawlSpider. I need to render my start_url responses to feed my crawl spider. Unfortunately my crawl spider stops after rendering of the first responds. Any idea what is going wrong?

class VideoSpider(CrawlSpider):

    start_urls = ['https://juke.com/de/de/search?q=1+Mord+f%C3%BCr+2']

rules = (
    Rule(LinkExtractor(allow=()), callback='parse_items',process_request = "use_splash",),
)

def use_splash(self, request):
    request.meta['splash'] = {
            'endpoint':'render.html',
            'args':{
                'wait':0.5,
                }
            }     
    return request

def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(url, self.parse, meta={
            'splash': {
                'endpoint': 'render.html',
                'args': {'wait': 0.5}
        }
    })  


def parse_items(self, response):      
    data = response.body
    print(data)

推荐答案

使用 SplashRequest 而不是 scrapy.Request... 查看我的答案CrawlSpider with Splash

Use SplashRequest instead of scrapy.Request... Check out my answer CrawlSpider with Splash

这篇关于CrawlSpider with Splash 在第一个 URL 后卡住的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆