Scrapy 使用项目并将数据保存在 json 文件中 [英] Scrapy use item and save data in a json file

查看:61
本文介绍了Scrapy 使用项目并将数据保存在 json 文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用scrapy item并操作数据并将所有内容保存在json文件中(使用像db一样的json文件).

I want to use scrapy item and manipulate data and saving all in json file (using json file like a db).

# Spider Class

class Spider(scrapy.Spider):
    name = 'productpage'
    start_urls = ['https://www.productpage.com']

    def parse(self, response):
        for product in response.css('article'):

            link = product.css('a::attr(href)').get()
            id = link.split('/')[-1]
            title = product.css('a > span::attr(content)').get()
            product = Product(self.name, id, title, price,'', link)
            yield scrapy.Request('{}.json'.format(link), callback=self.parse_product, meta={'product': product})

        yield scrapy.Request(url=response.url, callback=self.parse, dont_filter=True)

    def parse_product(self, response):
        product = response.meta['product']
        for size in json.loads(response.body_as_unicode()):
            product.size.append(size['name'])

        if self.storage.update(product.__dict__):
            product.send('url')


# STORAGE CLASS

class Storage:

    def __init__(self, name):
        self.name = name
        self.path = '{}.json'.format(self.name)
        self.load()  """Load json database"""

    def update(self, new_item):
        # .... do things and update data ...
        return True

# Product Class

class Product:

    def __init__(self, name, id, title, size, link):
        self.name = name
        self.id = id
        self.title = title
        self.size = []
        self.link = link

    def send(self, url):
        return  # send notify...


Spider 类在 start_url 的主页中搜索产品,然后解析产品页面以捕获尺寸.最后,它搜索 self.storage.update(product.__dict__) 上是否有更新,如果是真的,则发送通知.

Spider class search for products in main page of start_url, then it parse product page to catch also sizes. Finally it search if there are updates on self.storage.update(product.__dict__) and if it's true send a notification.

如何在我的代码中实现 Item?我以为我可以在产品类中插入它,但我不能包含发送方法...

How can I implement Item in my code? I thought I could insert it in Product Class, but I can't include send method...

推荐答案

你应该定义你想要的项目.解析后yield.

You should define the item you want. And yield it after parsed.

最后,运行命令:scrapy crawl [spider] -o xx.json

附注:默认scrapy支持导出json文件.

PS: Default scrapy had support export json file.

这篇关于Scrapy 使用项目并将数据保存在 json 文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆