Scrapy - 为什么 For 循环中的项目在另一个解析器中访问时具有相同的值 [英] Scrapy - Why Item Inside For Loop Has The Same Value While Accessed in Another Parser
问题描述
我想抓取for循环内的链接,在for循环中有项目,我将项目传递给回调函数.但是为什么回调函数中的 item 具有相同的值.这是我的代码.
导入scrapy进口重新从scraper.product_items 导入产品类 ProductSpider(scrapy.Spider):名称 = "产品蜘蛛"start_urls = ['http://www.website.com/category-page/',]定义解析(自我,响应):项目 = 产品()对于 response.css("div.product-card") 中的产品:link = products.css("a::attr(href)").extract_first()item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()item['price'] = products.css("div.product-card__old-price::text").extract_first()yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})def parse_product_page(self, response):item = response.meta['item']item['image'] = response.css("div.productImage::attr(data-big)").extract_first()归还物品
结果是这样的.
<预><代码>[{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},]如您所见,每次迭代的 sku 和 price 都具有相同的值.我想要 sku 和价格不同的结果.如果我得到自解析的结果,像这样更改代码.
导入scrapy进口重新从scraper.product_items 导入产品类 LazadaSpider(scrapy.Spider):名称 = "lazada"start_urls = ['http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',]定义解析(自我,响应):项目 = 产品()对于 response.css("div.product-card") 中的产品:link = products.css("a::attr(href)").extract_first()item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()item['price'] = products.css("div.product-card__old-price::text").extract_first()产量项目
那么每次迭代的 sku 和 price 的值都是正确的.
<预><代码>[{"sku": "CA199FA31FKAANID", "price": "299"},{"sku": "SW437OTAA31QO3ANID", "price": "200"},{"sku": "SW437OTAM1RAANID", "price": "235"},]您应该在 for
循环内创建项目,否则您只会在所有迭代之间共享相同的项目,仅重新填充其值.所以正确的代码是:
def 解析(自我,响应):对于 response.css("div.product-card") 中的产品:项目 = 产品()link = products.css("a::attr(href)").extract_first()item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()item['price'] = products.css("div.product-card__old-price::text").extract_first()产量项目
I want to scrape the link inside the for loop, in for loop there are items, I passed the item to the callback function. But why the item in the callback function has the same value. This is my code.
import scrapy
import re
from scraper.product_items import Product
class ProductSpider(scrapy.Spider):
name = "productspider"
start_urls = [
'http://www.website.com/category-page/',
]
def parse(self, response):
item = Product()
for products in response.css("div.product-card"):
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})
def parse_product_page(self, response):
item = response.meta['item']
item['image'] = response.css("div.productImage::attr(data-big)").extract_first()
return item
The result is this.
[
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},
]
As you can see, the sku and price has the same value for each iteration. I want the result of the sku and price different. If I get the result of the self parse, change the code like this.
import scrapy
import re
from scraper.product_items import Product
class LazadaSpider(scrapy.Spider):
name = "lazada"
start_urls = [
'http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',
]
def parse(self, response):
item = Product()
for products in response.css("div.product-card"):
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield item
Then the value of sku and price is correct for each iteration.
[
{"sku": "CA199FA31FKAANID", "price": "299"},
{"sku": "SW437OTAA31QO3ANID", "price": "200"},
{"sku": "SW437OTAM1RAANID", "price": "235"},
]
You should create item inside for
loop, otherwise you just share same item between all the iterations repopulating its values only. So correct code is:
def parse(self, response):
for products in response.css("div.product-card"):
item = Product()
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield item
这篇关于Scrapy - 为什么 For 循环中的项目在另一个解析器中访问时具有相同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!