如何在Scrapy .csv结果中获取双引号 [英] How to get double quotes in Scrapy .csv results

查看:126
本文介绍了如何在Scrapy .csv结果中获取双引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用Scrapy的输出中出现报价问题.我正在尝试抓取包含逗号的数据,这导致在某些列中出现双引号,如下所示:

I have a problem with quotations within outputs using Scrapy. I am trying to scrap data that contains commas and this results in double quotations in some columns like so:

TEST,TEST,TEST,ON,TEST,TEST,"$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
TEST,TEST,TEST,ON,TEST,TEST,"$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"

只有带有逗号的列才会被双引号引起来.如何对所有数据列加双引号?

Only columns with commas get double quoted. How can I double quote all my data columns?

我希望Scrapy输出:

I want Scrapy to output:

"TEST","TEST","TEST","ON","TEST","TEST","$2,449,000, 4,735 Sq Ft, 6 Bed, 5.1 Bath, Listed 03/01/2016"
"TEST","TEST","TEST","ON","TEST","TEST","$2,895,000, 4,975 Sq Ft, 5 Bed, 4.1 Bath, Listed 01/03/2016"

我可以更改任何设置来做到这一点吗?

Are there any settings I can change to do this?

推荐答案

默认情况下,对于CSV输出,Scrapy使用

By default, for CSV output, Scrapy uses csv.writer() with the defaults.

对于字段引号,默认值为csv.QUOTE_MINIMAL :

指示编写器对象仅引用包含以下内容的字段 特殊字符,例如定界符,quotechar或任何 换行符中的字符.

Instructs writer objects to only quote those fields which contain special characters such as delimiter, quotechar or any of the characters in lineterminator.

但是您可以在默认的'excel'方言的基础上构建自己的CSV项目导出器并设置新的方言.

But you can build your own CSV item exporter and set a new dialect, building on the default 'excel' dialect.

例如,在exporters.py模块中,定义以下内容

For example, in an exporters.py module, define the following

import csv

from scrapy.exporters import CsvItemExporter


class QuoteAllDialect(csv.excel):
    quoting = csv.QUOTE_ALL


class QuoteAllCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        kwargs.update({'dialect': QuoteAllDialect})
        super(QuoteAllCsvItemExporter, self).__init__(*args, **kwargs)

然后,您只需要在您的设置用于CSV输出,例如:

Then you simply need to reference this item exporter in your settings for CSV output, something like:

FEED_EXPORTERS = {
    'csv': 'myproject.exporters.QuoteAllCsvItemExporter',
}

还有一个像这样的简单蜘蛛:

And a simple spider like this:

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ['http://example.com/']

    def parse(self, response):
        yield {
            "name": "Some name",
            "title": "Some title, baby!",
            "description": "Some description, with commas, quotes (\") and all"
        }

将输出以下内容:

"description","name","title"
"Some description, with commas, quotes ("") and all","Some name","Some title, baby!"

这篇关于如何在Scrapy .csv结果中获取双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆