如何在Scrapy中将抓取的数据写入CSV文件? [英] How to write scraped data into a CSV file in Scrapy?

查看:987
本文介绍了如何在Scrapy中将抓取的数据写入CSV文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过提取子链接及其标题来抓取网站,然后将提取的标题及其相关链接保存到CSV文件中.我运行以下代码,创建了CSV文件,但它为空.有帮助吗?

I am trying to scrape a website by extracting the sub-links and their titles, and then save the extracted titles and their associated links into a CSV file. I run the following code, the CSV file is created but it is empty. Any help?

我的Spider.py文件如下所示:

My Spider.py file looks like this:

from scrapy import cmdline
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor

class HyperLinksSpider(CrawlSpider):
    name = "linksSpy"
    allowed_domains = ["some_website"]
    start_urls = ["some_website"]
    rules = (Rule(LinkExtractor(allow=()), callback='parse_obj', follow=True),)

def parse_obj(self, response):
    items = []
    for link in LinkExtractor(allow=(),    deny=self.allowed_domains).extract_links(response):
        item = ExtractlinksItem()
         for sel in response.xpath('//tr/td/a'):
              item['title'] = sel.xpath('/text()').extract()
              item['link'] = sel.xpath('/@href').extract()   
        items.append(item)
        return items
 cmdline.execute("scrapy crawl linksSpy".split())

我的pipelines.py是:

My pipelines.py is:

 import csv

 class ExtractlinksPipeline(object):

 def __init__(self):
    self.csvwriter = csv.writer(open('Links.csv', 'wb'))

 def process_item(self, item, spider):
    self.csvwriter.writerow((item['title'][0]), item['link'][0])
    return item

我的items.py是:

My items.py is:

 import scrapy

class ExtractlinksItem(scrapy.Item):
# define the fields for your item here like:
     title = scrapy.Field()
     link = scrapy.Field()

pass

我还更改了settings.py:

I have also changed my settings.py:

ITEM_PIPELINES = {'extractLinks.pipelines.ExtractlinksPipeline': 1}

推荐答案

要输出所有scrapy数据,其内置功能称为

To output all data scrapy has inbuilt feature called Feed Exports.
To put it shortly all you need is two settings in your settings.py file: FEED_FORMAT - format in which the feed should be saved, in your case csv and FEED_URI - location where the feed should be saved, e.g. ~/my_feed.csv

我的相关答案通过用例进行了更详细的介绍:
https://stackoverflow.com/a/41473241/3737009

My related answer covers it in greater detail with a use case:
https://stackoverflow.com/a/41473241/3737009

这篇关于如何在Scrapy中将抓取的数据写入CSV文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆