在scrapy python中的csv文件的一行中每一列排列一个项目 [英] Arranging one items per one column in a row of csv file in scrapy python
问题描述
我有一个项目从一个网站,我把他们放在json文件如下
{
author:[TIM ROCK],
book_name:[Truk Lagoon,Pohnpei& amp; amp; amp; amp; amp; b}
{
author:[JOY],
book_name:[PARSER],
category:Accomp,
}
我想将它们存储在csv文件中,每行一个字典,一列如下
|作者| book_name |类别|
| TIM ROCK | Truk泻湖... |旅游|
| JOY | PARSER | Accomp |
我得到一行中一个字典的项目,但所有的列组合
我的 pipeline.py
代码是
import csv
class Blurb2Pipeline(object):
pre>
def __init __(self):
self.brandCategoryCsv = csv .writer(open('blurb.csv','wb'))
self.brandCategoryCsv.writerow(['book_name','author','category'])
def process_item (self,item,spider):
self.brandCategoryCsv.writerow([item ['book_name']。encode('utf-8'),
item ['author']。encode -8'),
item ['category']。encode('utf-8'),
])
return item
解决方案使用
csv.DictWriter
:>>> inputs = [{
...author:[TIM ROCK],
...book_name:[Truk Lagoon,Pohnpei& amp; Kosrae Dive Guide],
...category:Travel,
...},
... {
...author:[JOY],
...book_name:[PARSER],
...category:Accomp,
...}
...] >>>>
>>>>来自csv import DictWriter
>>>>来自cStringIO import StringIO
>>>>
>>>> buf = StringIO()
>>>> c = DictWriter(buf,fieldnames = ['author','book_name','category'])
>>> c.writeheader()
>>>> c.writerows(inputs)
>>>> print buf.getvalue()
author,book_name,category
['TIM ROCK'],['Truk Lagoon,Pohnpei& amp; Kosrae Dive Guide'],Travel
[ 'JOY'],['PARSER'],Accomp
但由于元素可以是列表或字符串,它有点棘手。在Python中,直接类型检查很有意义。
>>>>对于输入行:
...对于row.iteritems()中的k,v:
...如果不是isinstance(v,basestring):
... try:
... row [k] =','.join(v)
... except TypeError:
... pass
... c.writerow b $ b ...
>>>> print buf.getvalue()
author,book_name,category
TIM ROCK,Truk Lagoon,Pohnpei& amp; Kosrae Dive Guide,Travel
JOY,PARSER,Accomp
I had items that scraped from a site which i placed them in to json files like below
{ "author": ["TIM ROCK"], "book_name": ["Truk Lagoon, Pohnpei & Kosrae Dive Guide"], "category": "Travel", } { "author": ["JOY"], "book_name": ["PARSER"], "category": "Accomp", }
I want to store them in csv file with one dictionary per one row in which one item per one column as below
| author | book_name | category | | TIM ROCK | Truk Lagoon ... | Travel | | JOY | PARSER | Accomp |
i am getting the items of one dictionary in one row but with all the columns combined
My
pipeline.py
code isimport csv
class Blurb2Pipeline(object): def __init__(self): self.brandCategoryCsv = csv.writer(open('blurb.csv', 'wb')) self.brandCategoryCsv.writerow(['book_name', 'author','category']) def process_item(self, item, spider): self.brandCategoryCsv.writerow([item['book_name'].encode('utf-8'), item['author'].encode('utf-8'), item['category'].encode('utf-8'), ]) return item
解决方案The gist is this is very simple with
csv.DictWriter
:>>> inputs = [{ ... "author": ["TIM ROCK"], ... "book_name": ["Truk Lagoon, Pohnpei & Kosrae Dive Guide"], ... "category": "Travel", ... }, ... { ... "author": ["JOY"], ... "book_name": ["PARSER"], ... "category": "Accomp", ... } ... ] >>> >>> from csv import DictWriter >>> from cStringIO import StringIO >>> >>> buf=StringIO() >>> c=DictWriter(buf, fieldnames=['author', 'book_name', 'category']) >>> c.writeheader() >>> c.writerows(inputs) >>> print buf.getvalue() author,book_name,category ['TIM ROCK'],"['Truk Lagoon, Pohnpei & Kosrae Dive Guide']",Travel ['JOY'],['PARSER'],Accomp
It would be better to join those arrays on something, but since elements can be a list or a string, it's a bit tricky. Telling if something is a string or some-other-iterable is one of the few cases in Python where direct type-checking makes good sense.
>>> for row in inputs: ... for k, v in row.iteritems(): ... if not isinstance(v, basestring): ... try: ... row[k] = ', '.join(v) ... except TypeError: ... pass ... c.writerow(row) ... >>> print buf.getvalue() author,book_name,category TIM ROCK,"Truk Lagoon, Pohnpei & Kosrae Dive Guide",Travel JOY,PARSER,Accomp
这篇关于在scrapy python中的csv文件的一行中每一列排列一个项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!