Scrapy - 为什么循环中的项目在另一个解析器中访问时具有相同的值 [英] Scrapy - Why Item Inside For Loop Has The Same Value While Accessed in Another Parser

查看:13
本文介绍了Scrapy - 为什么循环中的项目在另一个解析器中访问时具有相同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取for循环内的链接,在for循环中有项目,我将项目传递给回调函数.但是为什么回调函数中的项目具有相同的值.这是我的代码.

导入scrapy进口重新从scraper.product_items 导入产品类 ProductSpider(scrapy.Spider):名称 = "产品蜘蛛"start_urls = ['http://www.website.com/category-page/',]定义解析(自我,响应):项目 = 产品()对于 response.css("div.product-card") 中的产品:link = products.css("a::attr(href)").extract_first()item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()item['price'] = products.css("div.product-card__old-price::text").extract_first()yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})def parse_product_page(self, response):item = response.meta['item']item['image'] = response.css("div.productImage::attr(data-big)").extract_first()归还物品

结果是这样的.

<预><代码>[{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},]

如您所见,每次迭代的 sku 和 price 都具有相同的值.我想要 sku 和价格不同的结果.如果我得到自解析的结果,像这样更改代码.

导入scrapy进口重新从scraper.product_items 导入产品类 LazadaSpider(scrapy.Spider):名称 = "lazada"start_urls = ['http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',]定义解析(自我,响应):项目 = 产品()对于 response.css("div.product-card") 中的产品:link = products.css("a::attr(href)").extract_first()item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()item['price'] = products.css("div.product-card__old-price::text").extract_first()产量项目

那么每次迭代的 sku 和 price 的值都是正确的.

<预><代码>[{"sku": "CA199FA31FKAANID", "price": "299"},{"sku": "SW437OTAA31QO3ANID", "price": "200"},{"sku": "SW437OTAM1RAANID", "price": "235"},]

解决方案

您应该在 for 循环内创建项目,否则您只会在所有迭代之间共享相同的项目,仅重新填充其值.所以正确的代码是:

def 解析(自我,响应):对于 response.css("div.product-card") 中的产品:项目 = 产品()link = products.css("a::attr(href)").extract_first()item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()item['price'] = products.css("div.product-card__old-price::text").extract_first()产量项目

I want to scrape the link inside the for loop, in for loop there are items, I passed the item to the callback function. But why the item in the callback function has the same value. This is my code.

import scrapy
import re
from scraper.product_items import Product

class ProductSpider(scrapy.Spider):
    name = "productspider"

    start_urls = [
        'http://www.website.com/category-page/',
    ]

    def parse(self, response):
        item = Product()
        for products in response.css("div.product-card"):
            link = products.css("a::attr(href)").extract_first()
            item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
            item['price'] = products.css("div.product-card__old-price::text").extract_first()
            yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})

    def parse_product_page(self, response):
        item = response.meta['item']
        item['image'] = response.css("div.productImage::attr(data-big)").extract_first()
        return item

The result is this.

[
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},
]

As you can see, the sku and price has the same value for each iteration. I want the result of the sku and price different. If I get the result of the self parse, change the code like this.

import scrapy
import re
from scraper.product_items import Product

class LazadaSpider(scrapy.Spider):
    name = "lazada"

    start_urls = [
        'http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',
    ]

    def parse(self, response):
        item = Product()
        for products in response.css("div.product-card"):
            link = products.css("a::attr(href)").extract_first()
            item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
            item['price'] = products.css("div.product-card__old-price::text").extract_first()
            yield item

Then the value of sku and price is correct for each iteration.

[
{"sku": "CA199FA31FKAANID", "price": "299"},
{"sku": "SW437OTAA31QO3ANID", "price": "200"},
{"sku": "SW437OTAM1RAANID", "price": "235"},
]

解决方案

You should create item inside for loop, otherwise you just share same item between all the iterations repopulating its values only. So correct code is:

def parse(self, response):
    for products in response.css("div.product-card"):
        item = Product()
        link = products.css("a::attr(href)").extract_first()
        item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
        item['price'] = products.css("div.product-card__old-price::text").extract_first()
        yield item

这篇关于Scrapy - 为什么循环中的项目在另一个解析器中访问时具有相同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆