Scrapy提取ld + JSON [英] Scrapy Extract ld+JSON
本文介绍了Scrapy提取ld + JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何提取名称和网址?
quotes_spiders.py
import scrapy
import json
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ["http://www.lazada.com.my/shop-power-banks2/?price=1572-1572"]
def parse(self, response):
data = json.loads(response.xpath('//script[@type="application/ld+json"]//text()').extract_first())
//how to extract the name and url?
yield data
要提取的数据
<script type="application/ld+json">{"@context":"https://schema.org","@type":"ItemList","itemListElement":[{"@type":"Product","image":"http://my-live-02.slatic.net/p/2/test-product-0601-7378-08684315-8be741b9107b9ace2f2fe68d9c9fd61a-webp-catalog_233.jpg","name":"test product 0601","offers":{"@type":"Offer","availability":"https://schema.org/InStock","price":"99999.00","priceCurrency":"RM"},"url":"http://www.lazada.com.my/test-product-0601-51348680.html?ff=1"}]}</script>
推荐答案
此行代码返回包含所需数据的字典:
This line of code returns a dictionary with the data you want:
data = json.loads(response.xpath('//script[@type="application/ld+json"]//text()').extract_first())
您需要做的就是像访问它一样
All you need to do is to access it like:
name = data['itemListElement'][0]['name']
url = data['itemListElement'][0]['url']
鉴于微数据包含一个列表,您需要检查该列表是否指向列表中的正确产品.
Given that the microdata contains a list you will need to check you are referring to the correct product in the list.
这篇关于Scrapy提取ld + JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文