如何在scrapy中使用meta获取所有数据并解析? [英] How fetch all data and parse using meta in scrapy?

查看:40
本文介绍了如何在scrapy中使用meta获取所有数据并解析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将所有数据保存在一个 json 文件中.如何使用元解析我的数据?我不知道我的元格式是否正常.最终在 json 文件中产生所有数据(我通过元数据和我解析_v)请帮我解决这个问题.

I want to save all of the data in a json file. how can i parse my data using meta? I don't know my meta format is ok or not. finally yield the all of the data (which i through by meta and which i parse_v) in a json file help me to out this problem, please.

现在我添加完整的代码.希望你能发现我的问题

now i add full code. hope so you find out my problem

import json
import scrapy
import time
import chompjs
from scrapy import Request
from scrapy.crawler import CrawlerProcess


class TrendyolSpider(scrapy.Spider):
    name = 'data'
    start_urls = ['https://www.trendyol.com/join-us/straplez-firfirli-simli-astarli-triko-elbise-krem-p-41896200']


    def final_parse(self, response):
        abc = response.xpath("//p/script[contains(@type,'application/ld+json')]/text()").extract_first()
        json_text = json.loads(abc)
        img = json_text.get('image')

        products = response.css('div.pd-app-container')
        for product in products:
            category = product.css('div.breadcrumb>a:nth-child(3)+ a.breadcrumb-item span::text').get(),
            product_name = product.css("h1.pr-new-br ::text").getall(),
            price = product.css('div.pr-bx-nm  span.prc-org::text').get().replace("TL", ""),
            discount_price = product.css('div.pr-bx-nm  span.prc-slg::text').get().replace("TL", ""),
            brand = response.css("div.sl-nm a::text").get(),
            image = img,
            size = product.css("div.pr-in-at-sp ::text").getall(),
            product_information = product.css("div.pr-in-dt-cn ::text").getall(),
            product_features = product.css("div.pr-prop-content ::text").getall(),


        all_info = response.xpath("//script[contains(@type,'application/javascript')]/text()").extract_first()
        product_json = chompjs.parse_js_object(all_info)
        ides = product_json['product']['productGroupId']

        varient_url = "https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/" + str(ides)

        yield Request(url=varient_url, callback=self.parse_v, meta={
            'category': category,
            'product_name': product_name,
            'price': price,
            'discount_price': discount_price,
            'brand': brand,
            'image': image,
            'size': size,
            'product_information': product_information,
            'product_features': product_features,
        })

    def parse_v(self, response):
        json_tex5 = json.loads(response.body)
        dataa = json_tex5.get('result').get("slicingAttributes")[0].get("attributes")
        yield {
            'category': response.meta['category'],
            'product_name': response.meta['product_name'],
            'price': response.meta['price'],
            'discount_price': response.meta['discount_price'],
            'brand': response.meta['brand'],
            'image': response.meta['image'],
            'size': response.meta['size'],
            'product_information': response.meta['product_information'],
            'product_features': response.meta['product_features'],
            'renk': dataa
            }

推荐答案

以下是根据您的问题给出的答案:

Here is the answer according to your question:

如果您想使用元将数据从一个解析方法传输到另一个解析方法,您需要为每个值创建键并使用元在请求中注入每个键值对,毕竟,在 parse_v 方法中,您必须新创建键并使用 response.meta 获取以前的键,这是新的键值对来产生像 'Category' 之类的数据: response.meta['cat']

If you want to transfer data from one parse methon to another using meta, you need to create key for each value and injected each key-value pair in Request using meta ,after all, in parse_v method, you have to create key newly and to grab previous key using response.meta and it's the new key-value pairs to yield data like 'Category': response.meta['cat']

class TrendyolSpider(scrapy.Spider):
    name = 'data'
    start_urls = [
        'https://www.trendyol.com/olalook/kadin-siyah-cepli-minik-beyaz-cicekli-klos-elbise-elb-19000480-p-6635101']

    def parse(self, response):
        text = response.xpath(
            "//p/script[contains(@type,'application/ld+json')]/text()").extract_first()
        json_text = json.loads(text)
        items = TrendyolItem()
        products = response.css('div.pd-app-container')
        for product in products:
            category = product.css(
                'div.breadcrumb>a:nth-child(3)+ a.breadcrumb-item span::text').get(),
            product_name = product.css('div.pr-in-cn h1.pr-new-br::text').get() + " " + product.css(
                'div.pr-in-cn h1.pr-new-br span::text').get(),
            price = product.css(
                'div.pr-bx-nm  span.prc-org::text').get().replace("TL", ""),
            discount_price = product.css(
                'div.pr-bx-nm  span.prc-slg::text').get().replace("TL", ""),
            brand = response.css("div.sl-nm a::text").get(),
            image = json_text.get('image'),
            size = product.css("div.pr-in-at-sp ::text").getall(),
            product_information = product.css(
                "div.pr-in-dt-cn ::text").getall(),
            product_features = product.css(
                "div.pr-prop-content ::text").getall(),

            items['category'] = category
            items['product_name'] = product_name
            items['price'] = price
            items['discount_price'] = discount_price
            items['brand'] = brand
            items['image'] = image
            items['size'] = size
            items['product_information'] = product_information
            items['product_features'] = product_features

        all_info = response.xpath(
            "//script[contains(@type,'application/javascript')]/text()").extract_first()
        product_json = chompjs.parse_js_object(all_info)
        ides = product_json['product']['productGroupId']

        varient_url = "https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/" + \
            str(ides)

        yield Request(
            url=varient_url,
            callback=self.parse_v,
            meta={'cat': category, 'pro_name': product_name,'p': price, 'dis_price': discount_price,
            'bra': brand,'ima':image,'si':size, 'porduct_info':product_information,'features':product_features
            }
            )

    def parse_v(self, response):
        #items = response.meta['items']
        json_tex5 = json.loads(response.body)
        dataa = json_tex5.get('result').get(
            "slicingAttributes")[0].get("attributes")
        for i in dataa:
            all_info = self.start_urls + i['contents'][0]['url'] + "https://cdn.dsmcdn.com"+i['contents'][0]['imageUrl']\
                + i['contents'][0]['price']['discountedPrice']['text'] + \
                i['contents'][0]['price']['originalPrice']['text']

        yield {
            'Category': response.meta['cat'],
            'Product_name': response.meta['pro_name'],
            'Price': response.meta['p'],
            'Discount_price': response.meta['dis_price'],
            'Brand': response.meta['bra'],
            'Image': response.meta['ima'],
            'Size': response.meta['si'],
            'Product_information': response.meta['porduct_info'],
            'Product_features': response.meta['features'],
            'rank': all_info
        }

这篇关于如何在scrapy中使用meta获取所有数据并解析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆