Python Scrapy:将属性传递给解析器 [英] Python Scrapy: passing properties into parser

查看:38
本文介绍了Python Scrapy:将属性传递给解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Scrapy 和网络抓取的新手,所以这可能是一个愚蠢的问题,但这不是第一次,所以这里是.

I'm new to Scrapy and web-scraping in general so this might be a stupid question but it wouldn't be the first time so here goes.

我有一个简单的 Scrapy 蜘蛛,基于教程示例,它处理各种 URL(在 start_urls 中).我想对 URL 进行分类,例如URL A、B 和 C 是类别 1,而 URLS D 和 E 是类别 2,然后能够在解析器处理每个 URL 的响应时将类别存储在结果项上.

I have a simple Scrapy spider, based on the tutorial example, that processes various URLs (in start_urls). I would like to categorise the URLs e.g. URLs A, B, and C are Category 1, while URLS D and E are Category 2, then be able to store the category on the resulting Items when the parser processes the response for each URL.

我想我可以为每个类别设置一个单独的蜘蛛,然后将该类别作为该类的一个属性,以便解析器可以从那里获取它.但我有点希望我可以为所有 URL 只使用一个蜘蛛,但告诉解析器对给定 URL 使用哪个类别.

I guess I could have a separate spider for each category, then just hold the category as an attribute on the class so the parser can pick it up from there. But I was kind of hoping I could have just one spider for all the URLs, but tell the parser which category to use for a given URL.

现在,我正在通过我的蜘蛛的 init() 方法在 start_urls 中设置 URL.如何将给定 URL 的类别从我的 init 方法传递给解析器,以便我可以在从该 URL 的响应生成的项目上记录类别?

Right now, I'm setting up the URLs in start_urls via my spider's init() method. How do I pass the category for a given URL from my init method to the parser so that I can record the category on the Items generated from the responses for that URL?

推荐答案

如 paul t.建议:

As paul t. suggested:

class MySpider(CrawlSpider):

    def start_requests(self):
        ...
        yield Request(url1, meta={'category': 'cat1'}, callback=self.parse)
        yield Request(url2, meta={'category': 'cat2'}, callback=self.parse)
        ...

    def parse(self, response):
        category = response.meta['category']
        ...

您使用 start_requests 来控制您访问的第一个 URL,将元数据附加到每个 URL,然后您可以通过 response.meta 访问该元数据.

You use start_requests to have control over the first URLs you're visiting, attaching metadata to each URL, and you can access that metadata through response.meta afterwards.

例如,如果您需要将数据从 parse 函数传递到 parse_item,同样如此.

Same thing if you need to pass data from a parse function to a parse_item, for instance.

这篇关于Python Scrapy:将属性传递给解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆