Scrapy :: CSV导出问题 [英] Scrapy :: Issues with CSV exporting

查看:169
本文介绍了Scrapy :: CSV导出问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用Scrapy将剪贴项目导出到CSV字段中,每个字段用双引号括起来。目前,CSV导出正确,但是当我尝试修改项字段并手动添加双引号时,CSV最后以三重双引号括起来的每个字段。这里是我想做的一个例子:

I am trying to use Scrapy to export scraped items into a CSV field with each field enclosed in double quotes. Currently, the CSV exports correctly, but when I try to modify the item fields and add double quotes manually, the CSV ends up with each field enclosed in triple double quotes. Here is an example of what I'm trying to do:

import scrapy
from tutorial.items import StoreItem

class SecilSpider(scrapy.Spider):
    name = "secil"
    allowed_domains = ["secilstore.com"]
    def start_requests(self):
        start_urls = reversed(["http://www.secilstore.com/yeni_liste/Sayfa/{0}".format(page) for page in xrange(1,2)] + \
                     ["http://www.secilstore.com/yeni_liste/Magaza/Aksesuar_32/Sayfa/{0}".format(page) for page in xrange(1,2)] + \
                     ["http://www.secilstore.com/yeni_liste/Magaza/%C3%87anta_33/Sayfa/{0}".format(page) for page in xrange(1,2)])
        return [ scrapy.Request(url = start_url) for start_url in start_urls ]

    def parse(self, response):
        item = StoreItem()
        for url in response.xpath('//div[@class="image"]/a/@href').extract():
            yield scrapy.Request("http://www.secilstore.com" + url, callback = self.parse)
        baseUrl = response.request.headers.get('Referer', None)
        if baseUrl is not None:
            baseUrl = baseUrl.split('Sayfa')[0]
        color = response.xpath('//a[@class="renk"]/text()').extract()
        for c in color:
            item['url'] = baseUrl
            item['productUrl'] = response.url
            item['imageUrl'] = "http://www.secilstore.com" + response.xpath('//img[@id="productMainImage"]/@src').extract()[0]
            item['color'] = c
            item['price'] = response.xpath('//span[@class="price cufonHover"]/text()').extract()[0] + "TL"
            item['title'] = response.xpath('//h2[@class="cufon"]/text()').extract()
            item['brand'] = response.xpath('//h3[@class="slogan cufonSemi"]/text()').extract()[0]
            size = '|'.join(s.strip() for s in response.xpath('//a[@class="inStock"]/text()').extract())
            item['size'] = size if size else -1
            oldPrice = response.xpath('//div[@class="indirimFiyat"]/text()').extract()
            item['oldPrice'] = oldPrice[0] + "TL" if oldPrice else -1
            items.append(item)
            yield item



我的CSV项目管道



My CSV Item Pipeline

class CSVPipeline(object):

  def __init__(self):
    self.files = {}

  @classmethod
  def from_crawler(cls, crawler):
    pipeline = cls()
    crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
    crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
    return pipeline

  def spider_opened(self, spider):
    file = open('/home/ali/%s_items.csv' % spider.name, 'w+b')
    self.files[spider] = file
    self.exporter = CsvItemExporter(file, False,'"')
    self.exporter.fields_to_export = ['url','productUrl','title','brand','imageUrl','price','oldPrice','color','size']
    self.exporter.start_exporting()

  def spider_closed(self, spider):
    self.exporter.finish_exporting()
    file = self.files.pop(spider)
    file.close()

  def process_item(self, item, spider):
    self.exporter.export_item(item)
    return item


b $ b

所以,当我尝试修改蜘蛛中的一个字段,并像这样手动添加双引号(例如,对于item ['url']):

So when, I try to modify a field in the spider and add double quotes manually like this (fpr example, for item['url']):

item['url'] = '"%s"' % baseUrl


b $ b

生成的CSV将打印出以下内容:

the resulting CSV prints out the following:

"""http://www.secilstore.com/yeni_liste/Magaza/%C3%87anta_33""",http://www.secilstore.com/urun/5905b5c6b858458df3f4851d477eec1b/Secil-Kilit-Aksesuarli-Kisa-Sapli-Canta,Kilit Aksesuarlı Kısa Saplı Çanta,Seçil,http://www.secilstore.com/_docs/i400x500/a/a1894cadeb_Kilit-Aksesuarli-Kisa-Sapli-canta.jpg,"69,90TL","159,90TL",Ekru,-1

可以看到,第一个字段被三重双引号括起来,而不是只有一个。还有趣的是,价格以双引号打印。

As you can see, the first field is surrounded by triple double quotes instead of only one. Also what is interesting is that the prices are printed in double quotes. How can I surround each field with only one pair of double quotes?

谢谢!

推荐答案

我发现它通过修改CSVItemPipeline:

I found it by modifying the CSVItemPipeline:

 self.exporter = CsvItemExporter(open(spider.name+".csv", "w"), False, 
                                        fields_to_export=self.fields_to_export, quoting=csv.QUOTE_ALL)


b $ b

这允许我生成一个带有双引号字段的CSV文件。

This allowed me to generate a CSV file with the fields in double quotes.

这篇关于Scrapy :: CSV导出问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆