在scrapy中修改CSV导出 [英] Modifiying CSV export in scrapy

查看:28
本文介绍了在scrapy中修改CSV导出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎遗漏了一些非常简单的东西.我想要做的就是使用 ; 作为CSV 导出器中的分隔符,而不是 ,.

I seem to be missing something very simple. All i want to do is use ; as a delimiter in the CSV exporter instead of ,.

我知道 CSV 导出器将 kwargs 传递给 csv writer,但我似乎无法弄清楚如何传递这个分隔符.

I know the CSV exporter passes kwargs to csv writer, but i cant seem to figure out how to pass this the delimiter.

我这样称呼我的蜘蛛:

scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv 

推荐答案

contrib/feedexport.py,

class FeedExporter(object):

    ...

    def open_spider(self, spider):
        file = TemporaryFile(prefix='feed-')
        exp = self._get_exporter(file)  # <-- this is where the exporter is instantiated
        exp.start_exporting()
        self.slots[spider] = SpiderSlot(file, exp)

    def _get_exporter(self, *a, **kw):
        return self.exporters[self.format](*a, **kw)  # <-- not passed in :(

您需要自己制作,这是一个示例:

You will need to make your own, here's an example:

from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter


class CsvOptionRespectingItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        delimiter = settings.get('CSV_DELIMITER', ',')
        kwargs['delimiter'] = delimiter
        super(CsvOptionRespectingItemExporter, self).__init__(*args, **kwargs)

在你的爬虫目录的settings.py文件中,添加:

In the settings.py file of your crawler directory, add this:

FEED_EXPORTERS = {
    'csv': 'importable.path.to.CsvOptionRespectingItemExporter',
}

现在,您可以按如下方式执行您的蜘蛛:

Now, you can execute your spider as follows:

scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv --set CSV_DELIMITER=';'

HTH.

这篇关于在scrapy中修改CSV导出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆