如何指示 Scrapy 不序列化项目字段? [英] How can I instruct Scrapy to not serialize an item field?
问题描述
作为熟悉 Scrapy 的学习实验,我正在编写一个 Scraper,它检查 HTML 页面的所有链接并报告 HTTP 的状态代码 HEAD 请求定向到他们.事实是,在我的一个项目定义中,我有一个项目字段,即 parent_url
,被视为元数据 - 也就是说,我并不打算在我的 Scraper 的输出中显示它.
As a learning experiment for familiarizing with Scrapy I'm writing a Scraper which checks all the links of a HTML page and reports the status codes of HTTP HEAD requests directed to them. Fact is, in one of my item definitions I have one item field, namely parent_url
, treated as metadata - that is, I do not mean to display it in my Scraper's output.
parent_url
定义在LinkItem
类中,如下图:
class LinkItem(Item):
name = Field()
url = Field()
parent_url = Field() # Identifies what URL this item was extracted from
status_code = Field()
为了从我的 Spider 的输出中省略 parent_url
,我尝试过:
In order to omit parent_url
from my Spider's output I've tried:
- 将
__init__
中的parent_url
定义为实例属性 - 我在尝试访问它时引发了异常; - 在
__init__
中分配给self["parent_url"]
,但正如文档中已经指出的,Scrapy 不允许分配给未声明的字段; - 将
Field(serializer=None)
或Field(serializer=empty_function)
分配给parent_url
,这会在抓取和 JSON 时生成连续异常输出只有逗号.
- Defining
parent_url
in__init__
as an instance attribute - I got exceptions raised when trying to access it; - Assigning to
self["parent_url"]
inside__init__
, but as already noted by the documentation Scrapy doesn't let assigning to undeclared fields; - Assigning
Field(serializer=None)
orField(serializer=empty_function)
toparent_url
, which generated continuous exceptions while scraping and a JSON output with only commas.
尚未找到解决方案,我正在寻求外部帮助.parent_url
字段/属性在管道内部使用,我不知道还有什么可以替代它.
Not having yet come to a solution, I'm looking for external help. The parent_url
field/attribute is used internally within a pipeline, and I don't know what else to substitute it with.
推荐答案
您可以指定字段,这些字段应该通过 FEED_EXPORT_FIELDS 设置.例如:
You can specify fields, which should be exported via FEED_EXPORT_FIELDS setting. For example:
# in `settings.py`
FEED_EXPORT_FIELDS = ['name', 'url', 'status_code']
这篇关于如何指示 Scrapy 不序列化项目字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!