如何使用scrapy收集所有ancor href? [英] How to collect all of the ancor href using scrapy?
本文介绍了如何使用scrapy收集所有ancor href?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试着在scrapy shell中找到这个
i try to find this in scrapy shell
>>>scrapy shell https://www.trendyol.com/trendyol-man/antrasit-basic-erkek-bisiklet-yaka-oversize-kisa-kollu-t-shirt-tmnss21ts0811-p-90831387
>>>response.css("div.slick-track").getall()
在输出中显示没有 ancor 部分的所有内容.我需要所有的图像href.请帮我解决这个问题
in output show everything without ancor part. I need all of the image href. please, help me to solve this problem
推荐答案
正如 Fazlul 所述,数据是动态生成的(更具体地说,仅限图像和评论).使用 Chrome 开发工具,您可以轻松找到此 API https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869
.现在,你可以走了.
As mentioned by Fazlul the data is generating dynamically (more specifically images and reviews only). Using chrome dev tools, you can find this API https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869
easily. Now, you are good to go.
代码
from scrapy import Request
class Trendyol(scrapy.Spider):
name = 'test'
domain_name = "https://www.trendyol.com"
def start_requests(self):
url = "https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869"
yield Request(url=url, callback=self.parse)
def parse(self, response):
json_text = json.loads(response.body)
data = json_text.get('result').get("slicingAttributes")[0].get("attributes")
for i in data:
full_url = self.domain_name+i['contents'][0]['url']
print(full_url)
这篇关于如何使用scrapy收集所有ancor href?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文