Scrapy:没有标题的 CSV 输出 [英] Scrapy: CSV output without header

查看:23
本文介绍了Scrapy:没有标题的 CSV 输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用命令 scrapy crawl <project>-o <filename.csv>,我得到带有标题的 Item 字典的输出.这很好.但是,如果文件已经存在,我希望scrapy 省略标题.scrapy 是否能够做到这一点,或者我是否需要实现该功能?

When I use the command scrapy crawl <project> -o <filename.csv>, I get the output of my Item dictionary with headers. This is good. However, I would like scrapy to omit headers if the file already exists. Is scrapy capable of doing this or do I need to implement that functionality?

推荐答案

CsvItemExporter 中有include_headers_line=True 但是不知道怎么直接用.http://doc.scrapy.org/en/latest/topics/exporters.html#csvitemexporter

There is include_headers_line=True in CsvItemExporter but I don't know how to use it directly. http://doc.scrapy.org/en/latest/topics/exporters.html#csvitemexporter

但是您可以使用 include_headers_line=False 在文件 exporters.py(与 settings.py代码>items.py)

But you can create own exporter with include_headers_line=False in file exporters.py (in the same folder as settings.py and items.py)

from scrapy.exporters import CsvItemExporter


class HeadlessCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        kwargs['include_headers_line'] = False
        super(HeadlessCsvItemExporter, self).__init__(*args, **kwargs)

然后你必须在settings.py

FEED_EXPORTERS = {
    'csv': 'your_project_name.exporters.HeadlessCsvItemExporter',
}

现在scrapy应该编写没有标题的csv文件.

And now scrapy should write csv file without headers.

scrapy crawl <project> -o <filename.csv>

或者你可以设置

FEED_EXPORTERS = {
    'headless': 'your_project_name.exporters.HeadlessCsvItemExporter',
}

并且只有在您使用 -t headless

scrapy crawl <project> -o <filename.csv> -t headless

ps.不要忘记在 setttings.py

现在导出器仅在文件不为空时才跳过标题(如果 file.tell() > 0)

Now exporter skips headers only if file is not empty (if file.tell() > 0)

from scrapy.exporters import CsvItemExporter


class HeadlessCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):

        # args[0] is (opened) file handler
        # if file is not empty then skip headers
        if args[0].tell() > 0:
            kwargs['include_headers_line'] = False

        super(HeadlessCsvItemExporter, self).__init__(*args, **kwargs)

这篇关于Scrapy:没有标题的 CSV 输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆