将额外的值和 url 一起传递给爬虫蜘蛛 [英] Pass extra values along with urls to scrapy spider

查看：48 发布时间：2021/6/26 19:29:25 python python-2.7 web-scraping scrapy scrapy-spider

本文介绍了将额外的值和 url 一起传递给爬虫蜘蛛的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 (id,url) 形式的元组列表我需要从一个 url 列表中抓取一个产品，当这些产品被抓取时，我需要将它们存储在它们的 id 下的数据库中.

I've a list of tuples in the form (id,url) I need to crawl a product from a list of urls, and when those products are crawled i need to store them in database under their id.

问题是我无法理解如何将 id 传递给解析函数，以便我可以在他们的 id 下存储抓取的项目.

problem is i can't understand how to pass id to parse function so that i can store crawled item under their id.

推荐答案

在 start_requests() 并在 meta:

Initialize start urls in start_requests() and pass id in meta:

class MySpider(Spider):
    mapping = [(1, 'my_url1'), (2, 'my_url2')]

    ...

    def start_requests(self):
        for id, url in self.mapping:
            yield Request(url, callback=self.parse_page, meta={'id': id})

    def parse_page(self, response):
        id = response.meta['id']

这篇关于将额外的值和 url 一起传递给爬虫蜘蛛的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将额外的值和 url 一起传递给爬虫蜘蛛 [英] Pass extra values along with urls to scrapy spider

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将额外的值和 url 一起传递给爬虫蜘蛛 [英] Pass extra values along with urls to scrapy spider

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭