空的 .json 文件 [英] Empty .json file
本文介绍了空的 .json 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我编写了这个简短的蜘蛛代码来从黑客新闻头版中提取标题(http://news.ycombinator.com/).
I have written this short spider code to extract titles from hacker news front page(http://news.ycombinator.com/).
import scrapy
class HackerItem(scrapy.Item): #declaring the item
hackertitle = scrapy.Field()
class HackerSpider(scrapy.Spider):
name = 'hackernewscrawler'
allowed_domains = ['news.ycombinator.com'] # website we chose
start_urls = ['http://news.ycombinator.com/']
def parse(self,response):
sel = scrapy.Selector(response) #selector to help us extract the titles
item=HackerItem() #the item declared up
# xpath of the titles
item['hackertitle'] =
sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()
# printing titles using print statement.
print (item['hackertitle']
但是当我运行代码时 scrapy scrawlhackernewscrawler -o hntitles.json -t json
However when i run the code scrapy scrawl hackernewscrawler -o hntitles.json -t json
我得到一个空的 .json 文件,其中没有任何内容.
i get an empty .json file that does not have any content in it.
推荐答案
你应该把print
语句改为yield
:
import scrapy
class HackerItem(scrapy.Item): #declaring the item
hackertitle = scrapy.Field()
class HackerSpider(scrapy.Spider):
name = 'hackernewscrawler'
allowed_domains = ['news.ycombinator.com'] # website we chose
start_urls = ['http://news.ycombinator.com/']
def parse(self,response):
sel = scrapy.Selector(response) #selector to help us extract the titles
item=HackerItem() #the item declared up
# xpath of the titles
item['hackertitle'] = sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()
# return items
yield item
然后运行:
scrapy crawl hackernewscrawler -o hntitles.json -t json
这篇关于空的 .json 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文