在scrapy中写入单独的列而不是逗号分隔csv文件 [英] Writing to separate columns instead of comma seperated for csv files in scrapy

查看:317
本文介绍了在scrapy中写入单独的列而不是逗号分隔csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scrapy并将从网页获取的数据写入CSV文件



我的管道代码:

  def __init __(self):
self.file_name = csv.writer ','wb'))
self.file_name.writerow(['Title','Release Date','Director'])

def process_item(self,item,spider)
self.file_name.writerow([item ['Title']。encode('utf-8'),
item ['Release Date']。encode('utf-8'),
item ['Director']。encode('utf-8'),
])
return item

我在CSV文件中的输出格式是:

 标题,发布日期,导演
And Now For Something完全不同,1971,Ian MacNaughton
Monty Python和圣杯,1975,Terry Gilliam和Terry Jones
Monty Python的生活Brian,1979,Terry Jones
。 ....

但是可以写 title 及其值放入一列发布日期,并将其值导入下一列 Director 及其值

 标题,发布日期,发布日期,导演
和现在完全不同的东西,1971年,Ian MacNaughton
蒙蒂Python和圣杯,1975年,特里·吉林和特里琼斯
蒙蒂Python的生活Brian,1979年,特里琼斯

任何帮助将不胜感激。提前感谢。

解决方案


更新 - 重新编码以便:


  1. 使用@madjar和

  2. 建议的生成函数到由OP提供的代码片段。




目标输出



我正在尝试使用 texttable 。它产生与问题中相同的输出。这个输出可能被写入一个csv文件(记录将需要按摩适当的csv方言,我找不到一个仍然使用 csv.writer 的方法,仍然得到

 标题,发布日期,导演
和现在为什么完全不同,1971年, Ian MacNaughton
Monty Python和The Holy Grail,1975,Terry Gilliam和Terry Jones
Monty Python的生活Brian,1979,Terry Jones



代码



下面是您生成上述结果所需的代码草图:



from texttable import Texttable

#------ -------------------------------------------------- --------
#想象一下,Scrapy为每个记录生成的数据:
#一个三个项目的字典。第一个设置ot函数
#生成texttable函数中使用的数据

def process_item(item):
#这个按钮对每个记录进行按下以准备写入csv
item ['Title'] = item ['Title']。encode('utf-8')+','
item ['Release Date'] = item ['Release Date']。encode 'utf-8')+','
item ['Director'] = item ['Director']。encode('utf-8')
return item

def initialise_dataset():
data = [{'Title':'Title',
'Release Date':'Release Date',
'Director':'Director'
},#first item保存表头
{'Title':'And Now For Something完全不同',
'发布日期':'1971',
'Director':'Ian MacNaughton'
},
{'Title':'Monty Python And The Holy Grail',
'发布日期:'1975',
'导演':'Terry Gilliam和Terry Jones'
},
{'Title':Monty Python的Brian生活,
'发布日期':'1979',
' Jones'
}
]

data = [process_item(item)for data in data]
返回数据

def records ):
for data in data:
yield [item ['Title'],item ['Release Date'],item ['Director']]

#数据模拟部分
#----------------------------------------- ---------------

def create_table(data):
#创建表
table = Texttable(max_width = 0)
table.set_deco(Texttable.HEADER)
table.set_cols_align([l,c,c])
table.add_rows(records(data))

#split,删除
#标题下面的下划线,然后再次拉在一起。很多方法清理这...
tt = table.draw()。split('\\\
')
del tt [1]#删除标题下面的行
tt = '\\\
'.join(tt)
return tt

如果__name__ =='__main__':
data = initialise_dataset()
table = create_table )
print table


I am working with scrapy and writing the data fetched from web pages in to CSV files

My pipeline code:

def __init__(self):
    self.file_name = csv.writer(open('example.csv', 'wb'))
    self.file_name.writerow(['Title', 'Release Date','Director'])

def process_item(self, item, spider):
    self.file_name.writerow([item['Title'].encode('utf-8'),
                                item['Release Date'].encode('utf-8'),
                                item['Director'].encode('utf-8'),
                                ])
    return item 

And my output format in CSV file is:

Title,Release Date,Director
And Now For Something Completely Different,1971,Ian MacNaughton
Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,1979,Terry Jones
.....

But is it possible to write title and its values into one column , Release date and its values into the next column,Director and its values into the next column (because CSV is comma separated values) in a CSV file like the format below.

        Title,                                 Release Date,            Director
And Now For Something Completely Different,      1971,              Ian MacNaughton
Monty Python And The Holy Grail,                 1975,     Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,                    1979,              Terry Jones

Any help would be appreciated. Thanks in advance.

解决方案

Update -- Code re-factored in order to:

  1. use a generator function as suggested by @madjar and
  2. fit more closely to the code snippet provided by the OP.

The Target Output

I am trying an alternative using texttable. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer and still get the padded spaces in each field.

                  Title,                      Release Date,             Director            
And Now For Something Completely Different,       1971,              Ian MacNaughton        
Monty Python And The Holy Grail,                  1975,       Terry Gilliam and Terry Jones 
Monty Python's Life Of Brian,                     1979,                Terry Jones    

The Code

Here is a sketch of the code you would need to produce the result above:

from texttable import Texttable

# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function

def process_item(item):
    # This massages each record in preparation for writing to csv
    item['Title'] = item['Title'].encode('utf-8') + ','
    item['Release Date'] = item['Release Date'].encode('utf-8') + ','
    item['Director'] = item['Director'].encode('utf-8')
    return item

def initialise_dataset():
    data = [{'Title' : 'Title',
         'Release Date' : 'Release Date',
         'Director' : 'Director'
         }, # first item holds the table header
            {'Title' : 'And Now For Something Completely Different',
         'Release Date' : '1971',
         'Director' : 'Ian MacNaughton'
         },
        {'Title' : 'Monty Python And The Holy Grail',
         'Release Date' : '1975',
         'Director' : 'Terry Gilliam and Terry Jones'
         },
        {'Title' : "Monty Python's Life Of Brian",
         'Release Date' : '1979',
         'Director' : 'Terry Jones'
         }
        ]

    data = [ process_item(item) for item in data ]
    return data

def records(data):
    for item in data:
        yield [item['Title'], item['Release Date'], item['Director'] ]

# this ends the data simulation part
# --------------------------------------------------------

def create_table(data):
    # Create the table
    table = Texttable(max_width=0)
    table.set_deco(Texttable.HEADER)
    table.set_cols_align(["l", "c", "c"])
    table.add_rows( records(data) )

    # split, remove the underlining below the header
    # and pull together again. Many ways of cleaning this...
    tt = table.draw().split('\n')
    del tt[1] # remove the line under the header
    tt = '\n'.join(tt)
    return tt

if __name__ == '__main__':
    data = initialise_dataset()
    table = create_table(data)
    print table

这篇关于在scrapy中写入单独的列而不是逗号分隔csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆