空的 .json 文件 [英] Empty .json file

查看:34
本文介绍了空的 .json 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了这个简短的蜘蛛代码来从黑客新闻头版中提取标题(http://news.ycombinator.com/).

I have written this short spider code to extract titles from hacker news front page(http://news.ycombinator.com/).

import scrapy

class HackerItem(scrapy.Item): #declaring the item
    hackertitle = scrapy.Field()


class HackerSpider(scrapy.Spider):
    name = 'hackernewscrawler'
    allowed_domains = ['news.ycombinator.com'] # website we chose
    start_urls = ['http://news.ycombinator.com/']

   def parse(self,response):
        sel = scrapy.Selector(response) #selector to help us extract the titles
        item=HackerItem() #the item declared up

# xpath of the titles
        item['hackertitle'] = 
sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()


# printing titles using print statement.
        print (item['hackertitle']

但是当我运行代码时 scrapy scrawlhackernewscrawler -o hntitles.json -t json

However when i run the code scrapy scrawl hackernewscrawler -o hntitles.json -t json

我得到一个空的 .json 文件,其中没有任何内容.

i get an empty .json file that does not have any content in it.

推荐答案

你应该把print语句改为yield:

import scrapy

class HackerItem(scrapy.Item): #declaring the item
    hackertitle = scrapy.Field()


class HackerSpider(scrapy.Spider):
    name = 'hackernewscrawler'
    allowed_domains = ['news.ycombinator.com'] # website we chose
    start_urls = ['http://news.ycombinator.com/']

    def parse(self,response):
        sel = scrapy.Selector(response) #selector to help us extract the titles
        item=HackerItem() #the item declared up

# xpath of the titles
        item['hackertitle'] = sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()


# return items
        yield item

然后运行:

scrapy crawl hackernewscrawler -o hntitles.json -t json

这篇关于空的 .json 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆