有没有办法在scrapy中的 reactor.run() 之后运行代码? [英] Is there a way to run code after reactor.run() in scrapy?

查看：55 发布时间：2021/7/17 18:33:10 scrapy

本文介绍了有没有办法在scrapy中的 reactor.run() 之后运行代码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发一个scrapy api.我的问题之一是扭曲的反应堆无法重新启动.我使用 crawl runner 而不是 crawl process 修复了这个问题.我的蜘蛛从网站中提取链接，验证它们.我的问题是，如果我在 reactor.run() 之后添加验证代码，它就不起作用.这是我的代码:

I am working on a scrapy api. One of my issues was that the twisted reactor wasn't restartable. I fixed this using crawl runner as opposed to crawl process. My spider extracts links from a website, validates them. My issue is that if I add the validation code after reactor.run() it doesn't work. This is my code:

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from twisted.internet import reactor
from urllib.parse import urlparse

list = set([])
list_validate = set([])
runner = CrawlerRunner()


class Crawler(CrawlSpider):

name = "Crawler"
start_urls = ['https:www.example.com']
allowed_domains = ['www.example.com']
rules = [Rule(LinkExtractor(), callback='parse_links', follow=True)]
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})

def parse_links(self, response):
    base_url = url
    href = response.xpath('//a/@href').getall()
    list.add(urllib.parse.quote(response.url, safe=':/'))
    for link in href:
        if base_url not in link:
            list.add(urllib.parse.quote(response.urljoin(link), safe=':/'))
    for link in list:
        if base_url in link:
            list_validate.add(link)


runner.crawl(Crawler)
reactor.run()

如果在reactor.run()之后添加验证链接的代码，它不会被执行.如果我把代码放在 reactor.run() 之前，什么都不会发生，因为蜘蛛还没有完成对所有链接的抓取.我应该怎么办?验证链接的代码非常好，我之前用过它，并且可以正常工作.

If add the code that validates the links after reactor.run(), it doesn't get executed. And if I put the code before reactor.run(), nothing happens because the spider hasn't yet finished crawling all the links. What should I do? The code that validates the links is perfectly fine I used it before and it works.

有没有办法在scrapy中的 reactor.run() 之后运行代码? [英] Is there a way to run code after reactor.run() in scrapy?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

有没有办法在scrapy中的 reactor.run() 之后运行代码? [英] Is there a way to run code after reactor.run() in scrapy?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭