如何使用BaseItemExporter中的fields_to_export属性来订购Scrapy CSV数据? [英] How can I use the fields_to_export attribute in BaseItemExporter to order my Scrapy CSV data?

查看:630
本文介绍了如何使用BaseItemExporter中的fields_to_export属性来订购Scrapy CSV数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经从命令行使用了一个简单的 Scrapy 蜘蛛将数据导出为CSV格式,但数据的顺序似乎是随机的。如何在我的输出中订购CSV字段?

I have made a simple Scrapy spider that I use from the command line to export my data into the CSV format, but the order of the data seem random. How can I order the CSV fields in my output?

我使用以下命令行获取CSV数据:

I use the following command line to get CSV data:

scrapy crawl somwehere -o items.csv -t csv

根据 Scrapy文档,我应该能够使用 fields_to_export c> 类的 $ 属性来控制订单。但我无能为力的使用这个,因为我还没有找到任何简单的例子。

According to this Scrapy documentation, I should be able to use the fields_to_export attribute of the BaseItemExporter class to control the order. But I am clueless how to use this as I have not found any simple example to follow.

请注意:此问题与。但是,这个问题已超过2年,并没有解决最近的许多变更到Scrapy,并且没有提供令人满意的答案,因为它需要黑客以下一个或两个:

Please Note: This question is very similar to THIS one. However, that question is over 2 years old and doesn't address the many recent changes to Scrapy and neither provides a satisfactory answer, as it requires hacking one or both of:

  • contrib/exporter/init.py
  • contrib/feedexport.py

来处理似乎已经解决的问题...

to address some previous issues, that seem to have already been resolved...

非常感谢。

推荐答案

要使用这样的导出器,您需要创建自己的项目管道来处理您的蜘蛛输出。假设你有简单的case,你想有一个文件中的所有spider输出这是管道你应该使用( pipelines.py ):

To use such exporter you need to create your own Item pipeline that will process your spider output. Assuming that you have simple case and you want to have all spider output in one file this is pipeline you should use (pipelines.py):

from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter

class CSVPipeline(object):

  def __init__(self):
    self.files = {}

  @classmethod
  def from_crawler(cls, crawler):
    pipeline = cls()
    crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
    crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
    return pipeline

  def spider_opened(self, spider):
    file = open('%s_items.csv' % spider.name, 'w+b')
    self.files[spider] = file
    self.exporter = CsvItemExporter(file)
    self.exporter.fields_to_export = [list with Names of fields to export - order is important]
    self.exporter.start_exporting()

  def spider_closed(self, spider):
    self.exporter.finish_exporting()
    file = self.files.pop(spider)
    file.close()

  def process_item(self, item, spider):
    self.exporter.export_item(item)
    return item

当然,您需要记住在您的配置文件( settings.py )中添加此管道:

Of course you need to remember to add this pipeline in your configuration file (settings.py):

ITEM_PIPELINES = {'myproject.pipelines.CSVPipeline': 300 }

这篇关于如何使用BaseItemExporter中的fields_to_export属性来订购Scrapy CSV数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆