Python Scrapy:将属性传递给解析器 [英] Python Scrapy: passing properties into parser

查看：38 发布时间：2021/7/17 18:33:39 python scrapy

本文介绍了Python Scrapy:将属性传递给解析器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 Scrapy 和网络抓取的新手，所以这可能是一个愚蠢的问题，但这不是第一次，所以这里是.

I'm new to Scrapy and web-scraping in general so this might be a stupid question but it wouldn't be the first time so here goes.

我有一个简单的 Scrapy 蜘蛛，基于教程示例，它处理各种 URL(在 start_urls 中).我想对 URL 进行分类，例如URL A、B 和 C 是类别 1，而 URLS D 和 E 是类别 2，然后能够在解析器处理每个 URL 的响应时将类别存储在结果项上.

I have a simple Scrapy spider, based on the tutorial example, that processes various URLs (in start_urls). I would like to categorise the URLs e.g. URLs A, B, and C are Category 1, while URLS D and E are Category 2, then be able to store the category on the resulting Items when the parser processes the response for each URL.

我想我可以为每个类别设置一个单独的蜘蛛，然后将该类别作为该类的一个属性，以便解析器可以从那里获取它.但我有点希望我可以为所有 URL 只使用一个蜘蛛，但告诉解析器对给定 URL 使用哪个类别.

I guess I could have a separate spider for each category, then just hold the category as an attribute on the class so the parser can pick it up from there. But I was kind of hoping I could have just one spider for all the URLs, but tell the parser which category to use for a given URL.

现在，我正在通过我的蜘蛛的 init() 方法在 start_urls 中设置 URL.如何将给定 URL 的类别从我的 init 方法传递给解析器，以便我可以在从该 URL 的响应生成的项目上记录类别?

Right now, I'm setting up the URLs in start_urls via my spider's init() method. How do I pass the category for a given URL from my init method to the parser so that I can record the category on the Items generated from the responses for that URL?

推荐答案

如 paul t.建议:

As paul t. suggested:

class MySpider(CrawlSpider):

    def start_requests(self):
        ...
        yield Request(url1, meta={'category': 'cat1'}, callback=self.parse)
        yield Request(url2, meta={'category': 'cat2'}, callback=self.parse)
        ...

    def parse(self, response):
        category = response.meta['category']
        ...

您使用 start_requests 来控制您访问的第一个 URL，将元数据附加到每个 URL，然后您可以通过 response.meta 访问该元数据.

You use start_requests to have control over the first URLs you're visiting, attaching metadata to each URL, and you can access that metadata through response.meta afterwards.

例如，如果您需要将数据从 parse 函数传递到 parse_item，同样如此.

Same thing if you need to pass data from a parse function to a parse_item, for instance.

这篇关于Python Scrapy:将属性传递给解析器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python Scrapy:将属性传递给解析器 [英] Python Scrapy: passing properties into parser

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python Scrapy:将属性传递给解析器 [英] Python Scrapy: passing properties into parser

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭