Python Scrapy:如何让CSVItemExporter按特定顺序写列 [英] Python Scrapy: How to get CSVItemExporter to write columns in a specific order

查看:3065
本文介绍了Python Scrapy:如何让CSVItemExporter按特定顺序写列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Scrapy中,我在items.py中以特定顺序指定了我的项目,我的蜘蛛有那些项目在同一个顺序。然而,当我运行蜘蛛&将结果保存为csv,来自items.py或spider的列顺序不会维护。如何获取CSV以特定顺序显示列。非常感谢示例代码。

In Scrapy, I have my items specified in a certain order in items.py, & my spider has those items again in the same order. However, when I run the spider & save the results as a csv, the column order from the items.py or the spider is not maintained. How can I get the CSV to show columns in a specific order. Example code would be very appreciated.

感谢。

推荐答案

修改scrapy中的CSV导出相关

问题是导出器被实例化,没有任何关键字参数,因此像EXPORT_FIELDS这样的关键字被忽略。解决方案是一样的:您需要对CSV项导出器子类化以传递关键字参数。

The problem is that the exporter is instantiated without any keyword parameters, so the keywords like EXPORT_FIELDS are ignored. The solution is the same: you need to subclass the CSV item exporter to pass the keyword parameters.

按照上面的配方,我创建了一个新文件xyzzy / feedexport.py (将xyzzy更改为任何您的scrapy类命名):

Following the above recipe, I created a new file xyzzy/feedexport.py (change "xyzzy" to whatever your scrapy class is named):

"""
The standard CSVItemExporter class does not pass the kwargs through to the
CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored
(EXPORT_EMPTY is not used by CSV).
"""

from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter

class CSVkwItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
        kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')

        super(CSVkwItemExporter, self).__init__(*args, **kwargs)

然后将其添加到xyzzy / settings.py中:

and then added it into xyzzy/settings.py:

FEED_EXPORTERS = {
    'csv': 'xyzzy.feedexport.CSVkwItemExporter'
}

现在CSV导出器将符合EXPORT_FIELD设置 - to xyzzy / settings.py:

Now the CSV exporter will honor the EXPORT_FIELD setting - also add to xyzzy/settings.py:

# By specifying the fields to export, the CSV export honors the order
# rather than using a random order.
EXPORT_FIELDS = [
    'field1',
    'field2',
    'field3',
]

这篇关于Python Scrapy:如何让CSVItemExporter按特定顺序写列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆