如何使用scrapy收集所有ancor href? [英] How to collect all of the ancor href using scrapy?

查看:31
本文介绍了如何使用scrapy收集所有ancor href?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此处输入图片说明

我试着在scrapy shell中找到这个

i try to find this in scrapy shell

>>>scrapy shell https://www.trendyol.com/trendyol-man/antrasit-basic-erkek-bisiklet-yaka-oversize-kisa-kollu-t-shirt-tmnss21ts0811-p-90831387
>>>response.css("div.slick-track").getall()

在输出中显示没有 ancor 部分的所有内容.我需要所有的图像href.请帮我解决这个问题

in output show everything without ancor part. I need all of the image href. please, help me to solve this problem

推荐答案

正如 Fazlul 所述,数据是动态生成的(更具体地说,仅限图像和评论).使用 Chrome 开发工具,您可以轻松找到此 API https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869.现在,你可以走了.

As mentioned by Fazlul the data is generating dynamically (more specifically images and reviews only). Using chrome dev tools, you can find this API https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869 easily. Now, you are good to go.

代码

from scrapy import Request


class Trendyol(scrapy.Spider):
    name = 'test'
    domain_name = "https://www.trendyol.com"
    def start_requests(self):
        url = "https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869"

        yield Request(url=url, callback=self.parse)

    def parse(self, response):
        json_text = json.loads(response.body)
        data = json_text.get('result').get("slicingAttributes")[0].get("attributes")
        for i in data:
            full_url = self.domain_name+i['contents'][0]['url']
            print(full_url)

这篇关于如何使用scrapy收集所有ancor href?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆