在scrapy的start_requests()中返回项目 [英] Returning Items in scrapy's start_requests()

查看：57 发布时间：2021/7/16 21:53:52 python scrapy

本文介绍了在scrapy的start_requests()中返回项目的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个爬虫蜘蛛，它将许多 url 作为输入并将它们分类为类别(作为项目返回).这些 URL 通过我的爬虫的 start_requests() 方法提供给蜘蛛.

I am writing a scrapy spider that takes as input many urls and classifies them into categories (returned as items). These URLs are fed to the spider via my crawler's start_requests() method.

有些网址不用下载就可以分类，所以我想在start_requests()中直接yield给他们一个Item，其中被scrapy禁止.我怎样才能规避这种情况?

Some URLs can be classified without downloading them, so I would like to yield directly an Item for them in start_requests(), which is forbidden by scrapy. How can I circumvent this?

我曾考虑在自定义中间件中捕获这些请求，将它们转换为虚假的 Response 对象，然后我可以在请求回调中将其转换为 Item 对象，但欢迎任何更清洁的解决方案.

I have thought about catching these requests in a custom middleware that would turn them into spurious Response objects, that I could then convert into Item objects in the request callback, but any cleaner solution would be welcome.

在scrapy的start_requests()中返回项目 [英] Returning Items in scrapy's start_requests()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在scrapy的start_requests()中返回项目 [英] Returning Items in scrapy&#39;s start_requests()

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

在scrapy的start_requests()中返回项目 [英] Returning Items in scrapy's start_requests()

登录关闭