如何访问 Scrapy 项目管道中的所有抓取项目? [英] How to access all scraped items in Scrapy item pipeline?
问题描述
我有一个项目,它有一个排名字段,必须通过分析其他项目类来构建.我不想使用数据库或其他后端来存储它们 - 我只需要访问所有当前抓取的项目并对它们执行一些 itertools 魔法 - 在蜘蛛完成后但在我们导出数据之前我怎么能做到这一点(所以排名字段获胜不是空的)?
I have an item that has got a rank field that has to be build from analyzing other item class. I don't want to use database or other backend to store them - I just need to access all currently scraped items and do some itertools magic on them - how can I do this after spider finishes but before we export data (so rank field won't be empty)?
推荐答案
我认为信号可能会有所帮助.我在这里做了类似的事情
I think signals might help. I did something similar here
https://github.com/dm03514/CraigslistGigs/blob/master/craigslist_gigs/pipelines.py
这看起来有点老套但是在你的蜘蛛中你可以创建一个属性来存储你所有的报废物品.在您的管道中,您可以注册一个方法以在蜘蛛关闭信号上调用.此方法将蜘蛛实例作为参数.然后,您可以访问包含所有抓取项目的蜘蛛属性
It seems kind of hacky but In your spider you can create a property which will store all your Scraped items. In your pipeline you can register a method to Be called on spider closed signal. This method takes a spider instance as parameter. You can then access the spider property that contains all your scraped items
这篇关于如何访问 Scrapy 项目管道中的所有抓取项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!