如何动态设置 Scrapy 规则? [英] How to dynamically set Scrapy rules?

查看:30
本文介绍了如何动态设置 Scrapy 规则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类在初始化之前运行一些代码:

I have a class running some code before the init:

class NoFollowSpider(CrawlSpider):
    rules = ( Rule (SgmlLinkExtractor(allow=("", ),),
                callback="parse_items",  follow= True),
)

def __init__(self, moreparams=None, *args, **kwargs):
    super(NoFollowSpider, self).__init__(*args, **kwargs)
    self.moreparams = moreparams

我使用以下命令运行这个scrapy代码:

I am running this scrapy code with the following command:

> scrapy runspider my_spider.py -a moreparams="more parameters" -o output.txt 

现在,我希望可以从命令行配置名为 rules 的静态变量:

Now, I want the static variable named rules to be configurable from the command-line:

> scrapy runspider my_spider.py -a crawl=True -a moreparams="more parameters" -o output.txt

init 更改为:

def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
    if (crawl_pages is True):
        self.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items",  follow= True),
    )
    self.moreparams = moreparams

然而,如果我在 init 中切换静态变量 rules,scrapy 不再考虑它:它运行,但只抓取给定的 start_urls 而不是整个域.好像规则必须是静态类变量.

However, if I switch the static variable rules within the init, scrapy does not take it into account anymore: It runs, but only crawls the given start_urls and not the whole domain. It seems that rules must be a static class variable.

那么,如何动态设置静态变量?

So, How can I dynamically set a static variable?

推荐答案

这里是我在@Not_a_Golfer 和@nramirezuy 的大力帮助下解决问题的方法,我只是使用了他们的建议:

So here is how I resolved the problem with the great help of @Not_a_Golfer and @nramirezuy, I'm simply using a bit of both what they suggested:

class NoFollowSpider(CrawlSpider):

def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
    super(NoFollowSpider, self).__init__(*args, **kwargs)
    # Set the class member from here
    if (crawl_pages is True):
        NoFollowSpider.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items",  follow= True),)
        # Then recompile the Rules
        super(NoFollowSpider, self)._compile_rules()

    # Keep going as before
    self.moreparams = moreparams

谢谢大家的帮助!

这篇关于如何动态设置 Scrapy 规则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆