Scrapy :: CSV导出问题 [英] Scrapy :: Issues with CSV exporting
问题描述
我尝试使用Scrapy将剪贴项目导出到CSV字段中,每个字段用双引号括起来。目前,CSV导出正确,但是当我尝试修改项字段并手动添加双引号时,CSV最后以三重双引号括起来的每个字段。这里是我想做的一个例子:
I am trying to use Scrapy to export scraped items into a CSV field with each field enclosed in double quotes. Currently, the CSV exports correctly, but when I try to modify the item fields and add double quotes manually, the CSV ends up with each field enclosed in triple double quotes. Here is an example of what I'm trying to do:
import scrapy
from tutorial.items import StoreItem
class SecilSpider(scrapy.Spider):
name = "secil"
allowed_domains = ["secilstore.com"]
def start_requests(self):
start_urls = reversed(["http://www.secilstore.com/yeni_liste/Sayfa/{0}".format(page) for page in xrange(1,2)] + \
["http://www.secilstore.com/yeni_liste/Magaza/Aksesuar_32/Sayfa/{0}".format(page) for page in xrange(1,2)] + \
["http://www.secilstore.com/yeni_liste/Magaza/%C3%87anta_33/Sayfa/{0}".format(page) for page in xrange(1,2)])
return [ scrapy.Request(url = start_url) for start_url in start_urls ]
def parse(self, response):
item = StoreItem()
for url in response.xpath('//div[@class="image"]/a/@href').extract():
yield scrapy.Request("http://www.secilstore.com" + url, callback = self.parse)
baseUrl = response.request.headers.get('Referer', None)
if baseUrl is not None:
baseUrl = baseUrl.split('Sayfa')[0]
color = response.xpath('//a[@class="renk"]/text()').extract()
for c in color:
item['url'] = baseUrl
item['productUrl'] = response.url
item['imageUrl'] = "http://www.secilstore.com" + response.xpath('//img[@id="productMainImage"]/@src').extract()[0]
item['color'] = c
item['price'] = response.xpath('//span[@class="price cufonHover"]/text()').extract()[0] + "TL"
item['title'] = response.xpath('//h2[@class="cufon"]/text()').extract()
item['brand'] = response.xpath('//h3[@class="slogan cufonSemi"]/text()').extract()[0]
size = '|'.join(s.strip() for s in response.xpath('//a[@class="inStock"]/text()').extract())
item['size'] = size if size else -1
oldPrice = response.xpath('//div[@class="indirimFiyat"]/text()').extract()
item['oldPrice'] = oldPrice[0] + "TL" if oldPrice else -1
items.append(item)
yield item
我的CSV项目管道
My CSV Item Pipeline
class CSVPipeline(object):
def __init__(self):
self.files = {}
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
file = open('/home/ali/%s_items.csv' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = CsvItemExporter(file, False,'"')
self.exporter.fields_to_export = ['url','productUrl','title','brand','imageUrl','price','oldPrice','color','size']
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
b $ b
所以,当我尝试修改蜘蛛中的一个字段,并像这样手动添加双引号(例如,对于item ['url']):
So when, I try to modify a field in the spider and add double quotes manually like this (fpr example, for item['url']):
item['url'] = '"%s"' % baseUrl
b $ b
生成的CSV将打印出以下内容:
the resulting CSV prints out the following:
"""http://www.secilstore.com/yeni_liste/Magaza/%C3%87anta_33""",http://www.secilstore.com/urun/5905b5c6b858458df3f4851d477eec1b/Secil-Kilit-Aksesuarli-Kisa-Sapli-Canta,Kilit Aksesuarlı Kısa Saplı Çanta,Seçil,http://www.secilstore.com/_docs/i400x500/a/a1894cadeb_Kilit-Aksesuarli-Kisa-Sapli-canta.jpg,"69,90TL","159,90TL",Ekru,-1
可以看到,第一个字段被三重双引号括起来,而不是只有一个。还有趣的是,价格以双引号打印。
As you can see, the first field is surrounded by triple double quotes instead of only one. Also what is interesting is that the prices are printed in double quotes. How can I surround each field with only one pair of double quotes?
谢谢!
推荐答案
我发现它通过修改CSVItemPipeline:
I found it by modifying the CSVItemPipeline:
self.exporter = CsvItemExporter(open(spider.name+".csv", "w"), False,
fields_to_export=self.fields_to_export, quoting=csv.QUOTE_ALL)
b $ b
这允许我生成一个带有双引号字段的CSV文件。
This allowed me to generate a CSV file with the fields in double quotes.
这篇关于Scrapy :: CSV导出问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!