在scrapy中写入单独的列而不是逗号分隔csv文件 [英] Writing to separate columns instead of comma seperated for csv files in scrapy
问题描述
我正在使用scrapy并将从网页获取的数据写入CSV文件
我的管道
代码:
def __init __(self):
self.file_name = csv.writer ','wb'))
self.file_name.writerow(['Title','Release Date','Director'])
def process_item(self,item,spider)
self.file_name.writerow([item ['Title']。encode('utf-8'),
item ['Release Date']。encode('utf-8'),
item ['Director']。encode('utf-8'),
])
return item
我在CSV文件中的输出格式是:
标题,发布日期,导演
And Now For Something完全不同,1971,Ian MacNaughton
Monty Python和圣杯,1975,Terry Gilliam和Terry Jones
Monty Python的生活Brian,1979,Terry Jones
。 ....
但是可以写 title
及其值放入一列发布日期
,并将其值导入下一列 Director
及其值
标题,发布日期,发布日期,导演
和现在完全不同的东西,1971年,Ian MacNaughton
蒙蒂Python和圣杯,1975年,特里·吉林和特里琼斯
蒙蒂Python的生活Brian,1979年,特里琼斯
任何帮助将不胜感激。提前感谢。
更新 - 重新编码以便:
- 使用@madjar和
- 建议的生成函数到由OP提供的代码片段。
目标输出
我正在尝试使用 texttable
。它产生与问题中相同的输出。这个输出可能被写入一个csv文件(记录将需要按摩适当的csv方言,我找不到一个仍然使用 csv.writer
的方法,仍然得到
标题,发布日期,导演
和现在为什么完全不同,1971年, Ian MacNaughton
Monty Python和The Holy Grail,1975,Terry Gilliam和Terry Jones
Monty Python的生活Brian,1979,Terry Jones
代码
下面是您生成上述结果所需的代码草图:
from texttable import Texttable
#------ -------------------------------------------------- --------
#想象一下,Scrapy为每个记录生成的数据:
#一个三个项目的字典。第一个设置ot函数
#生成texttable函数中使用的数据
def process_item(item):
#这个按钮对每个记录进行按下以准备写入csv
item ['Title'] = item ['Title']。encode('utf-8')+','
item ['Release Date'] = item ['Release Date']。encode 'utf-8')+','
item ['Director'] = item ['Director']。encode('utf-8')
return item
def initialise_dataset():
data = [{'Title':'Title',
'Release Date':'Release Date',
'Director':'Director'
},#first item保存表头
{'Title':'And Now For Something完全不同',
'发布日期':'1971',
'Director':'Ian MacNaughton'
},
{'Title':'Monty Python And The Holy Grail',
'发布日期:'1975',
'导演':'Terry Gilliam和Terry Jones'
},
{'Title':Monty Python的Brian生活,
'发布日期':'1979',
' Jones'
}
]
data = [process_item(item)for data in data]
返回数据
def records ):
for data in data:
yield [item ['Title'],item ['Release Date'],item ['Director']]
#数据模拟部分
#----------------------------------------- ---------------
def create_table(data):
#创建表
table = Texttable(max_width = 0)
table.set_deco(Texttable.HEADER)
table.set_cols_align([l,c,c])
table.add_rows(records(data))
#split,删除
#标题下面的下划线,然后再次拉在一起。很多方法清理这...
tt = table.draw()。split('\\\
')
del tt [1]#删除标题下面的行
tt = '\\\
'.join(tt)
return tt
如果__name__ =='__main__':
data = initialise_dataset()
table = create_table )
print table
I am working with scrapy and writing the data fetched from web pages in to CSV files
My pipeline
code:
def __init__(self):
self.file_name = csv.writer(open('example.csv', 'wb'))
self.file_name.writerow(['Title', 'Release Date','Director'])
def process_item(self, item, spider):
self.file_name.writerow([item['Title'].encode('utf-8'),
item['Release Date'].encode('utf-8'),
item['Director'].encode('utf-8'),
])
return item
And my output format in CSV file is:
Title,Release Date,Director
And Now For Something Completely Different,1971,Ian MacNaughton
Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,1979,Terry Jones
.....
But is it possible to write title
and its values into one column , Release date
and its values into the next column,Director
and its values into the next column (because CSV is comma separated values) in a CSV file like the format below.
Title, Release Date, Director
And Now For Something Completely Different, 1971, Ian MacNaughton
Monty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones
Monty Python's Life Of Brian, 1979, Terry Jones
Any help would be appreciated. Thanks in advance.
Update -- Code re-factored in order to:
- use a generator function as suggested by @madjar and
- fit more closely to the code snippet provided by the OP.
The Target Output
I am trying an alternative using texttable
. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer
and still get the padded spaces in each field.
Title, Release Date, Director
And Now For Something Completely Different, 1971, Ian MacNaughton
Monty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones
Monty Python's Life Of Brian, 1979, Terry Jones
The Code
Here is a sketch of the code you would need to produce the result above:
from texttable import Texttable
# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function
def process_item(item):
# This massages each record in preparation for writing to csv
item['Title'] = item['Title'].encode('utf-8') + ','
item['Release Date'] = item['Release Date'].encode('utf-8') + ','
item['Director'] = item['Director'].encode('utf-8')
return item
def initialise_dataset():
data = [{'Title' : 'Title',
'Release Date' : 'Release Date',
'Director' : 'Director'
}, # first item holds the table header
{'Title' : 'And Now For Something Completely Different',
'Release Date' : '1971',
'Director' : 'Ian MacNaughton'
},
{'Title' : 'Monty Python And The Holy Grail',
'Release Date' : '1975',
'Director' : 'Terry Gilliam and Terry Jones'
},
{'Title' : "Monty Python's Life Of Brian",
'Release Date' : '1979',
'Director' : 'Terry Jones'
}
]
data = [ process_item(item) for item in data ]
return data
def records(data):
for item in data:
yield [item['Title'], item['Release Date'], item['Director'] ]
# this ends the data simulation part
# --------------------------------------------------------
def create_table(data):
# Create the table
table = Texttable(max_width=0)
table.set_deco(Texttable.HEADER)
table.set_cols_align(["l", "c", "c"])
table.add_rows( records(data) )
# split, remove the underlining below the header
# and pull together again. Many ways of cleaning this...
tt = table.draw().split('\n')
del tt[1] # remove the line under the header
tt = '\n'.join(tt)
return tt
if __name__ == '__main__':
data = initialise_dataset()
table = create_table(data)
print table
这篇关于在scrapy中写入单独的列而不是逗号分隔csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!