Scrapy :: CSV导出问题 [英] Scrapy :: Issues with CSV exporting

查看：169 发布时间：2017/2/24 21:42:10 python csv scrapy

本文介绍了Scrapy :: CSV导出问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用Scrapy将剪贴项目导出到CSV字段中，每个字段用双引号括起来。目前，CSV导出正确，但是当我尝试修改项字段并手动添加双引号时，CSV最后以三重双引号括起来的每个字段。这里是我想做的一个例子：

I am trying to use Scrapy to export scraped items into a CSV field with each field enclosed in double quotes. Currently, the CSV exports correctly, but when I try to modify the item fields and add double quotes manually, the CSV ends up with each field enclosed in triple double quotes. Here is an example of what I'm trying to do:

import scrapy
from tutorial.items import StoreItem

class SecilSpider(scrapy.Spider):
    name = "secil"
    allowed_domains = ["secilstore.com"]
    def start_requests(self):
        start_urls = reversed(["http://www.secilstore.com/yeni_liste/Sayfa/{0}".format(page) for page in xrange(1,2)] + \
                     ["http://www.secilstore.com/yeni_liste/Magaza/Aksesuar_32/Sayfa/{0}".format(page) for page in xrange(1,2)] + \
                     ["http://www.secilstore.com/yeni_liste/Magaza/%C3%87anta_33/Sayfa/{0}".format(page) for page in xrange(1,2)])
        return [ scrapy.Request(url = start_url) for start_url in start_urls ]

    def parse(self, response):
        item = StoreItem()
        for url in response.xpath('//div[@class="image"]/a/@href').extract():
            yield scrapy.Request("http://www.secilstore.com" + url, callback = self.parse)
        baseUrl = response.request.headers.get('Referer', None)
        if baseUrl is not None:
            baseUrl = baseUrl.split('Sayfa')[0]
        color = response.xpath('//a[@class="renk"]/text()').extract()
        for c in color:
            item['url'] = baseUrl
            item['productUrl'] = response.url
            item['imageUrl'] = "http://www.secilstore.com" + response.xpath('//img[@id="productMainImage"]/@src').extract()[0]
            item['color'] = c
            item['price'] = response.xpath('//span[@class="price cufonHover"]/text()').extract()[0] + "TL"
            item['title'] = response.xpath('//h2[@class="cufon"]/text()').extract()
            item['brand'] = response.xpath('//h3[@class="slogan cufonSemi"]/text()').extract()[0]
            size = '|'.join(s.strip() for s in response.xpath('//a[@class="inStock"]/text()').extract())
            item['size'] = size if size else -1
            oldPrice = response.xpath('//div[@class="indirimFiyat"]/text()').extract()
            item['oldPrice'] = oldPrice[0] + "TL" if oldPrice else -1
            items.append(item)
            yield item

我的CSV项目管道

My CSV Item Pipeline

class CSVPipeline(object):

  def __init__(self):
    self.files = {}

  @classmethod
  def from_crawler(cls, crawler):
    pipeline = cls()
    crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
    crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
    return pipeline

  def spider_opened(self, spider):
    file = open('/home/ali/%s_items.csv' % spider.name, 'w+b')
    self.files[spider] = file
    self.exporter = CsvItemExporter(file, False,'"')
    self.exporter.fields_to_export = ['url','productUrl','title','brand','imageUrl','price','oldPrice','color','size']
    self.exporter.start_exporting()

  def spider_closed(self, spider):
    self.exporter.finish_exporting()
    file = self.files.pop(spider)
    file.close()

  def process_item(self, item, spider):
    self.exporter.export_item(item)
    return item

b $ b

所以，当我尝试修改蜘蛛中的一个字段，并像这样手动添加双引号（例如，对于item ['url']）：

So when, I try to modify a field in the spider and add double quotes manually like this (fpr example, for item['url']):

item['url'] = '"%s"' % baseUrl

b $ b

生成的CSV将打印出以下内容：

the resulting CSV prints out the following:

"""http://www.secilstore.com/yeni_liste/Magaza/%C3%87anta_33""",http://www.secilstore.com/urun/5905b5c6b858458df3f4851d477eec1b/Secil-Kilit-Aksesuarli-Kisa-Sapli-Canta,Kilit Aksesuarlı Kısa Saplı Çanta,Seçil,http://www.secilstore.com/_docs/i400x500/a/a1894cadeb_Kilit-Aksesuarli-Kisa-Sapli-canta.jpg,"69,90TL","159,90TL",Ekru,-1

可以看到，第一个字段被三重双引号括起来，而不是只有一个。还有趣的是，价格以双引号打印。

As you can see, the first field is surrounded by triple double quotes instead of only one. Also what is interesting is that the prices are printed in double quotes. How can I surround each field with only one pair of double quotes?

谢谢！

Scrapy :: CSV导出问题 [英] Scrapy :: Issues with CSV exporting

问题描述

我的CSV项目管道

My CSV Item Pipeline

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy :: CSV导出问题 [英] Scrapy :: Issues with CSV exporting

问题描述

我的CSV项目管道

My CSV Item Pipeline

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭