如何使用 Scrapy 使用带有省略号的下一步按钮抓取数据 [英] How to scrape data using next button with ellipsis using Scrapy

查看:42
本文介绍了如何使用 Scrapy 使用带有省略号的下一步按钮抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要不断获取下一个按钮上的数据 <1 2 3 ... 5> 但源中没有提供 href 链接,也有省略号.有什么想法吗?这是我的代码

I need to continuously get the data on next button <1 2 3 ... 5> but there's no provided href link in the source also there's also elipsis. any idea please? here's my code

def start_requests(self):
    urls = (
        (self.parse_2, 'https://www.forever21.com/us/shop/catalog/category/f21/sale'),
    )
    for cb, url in urls:
        yield scrapy.Request(url, callback=cb)


def parse_2(self, response):
    for product_item_forever in response.css('div.pi_container'):
        forever_item = {
            'forever-title': product_item_forever.css('p.p_name::text').extract_first(),
            'forever-regular-price': product_item_forever.css('span.p_old_price::text').extract_first(),
            'forever-sale-price': product_item_forever.css('span.p_sale.t_pink::text').extract_first(),
            'forever-photo-url': product_item_forever.css('img::attr(data-original)').extract_first(),
            'forever-description-url': product_item_forever.css('a.item_slider.product_link::attr(href)').extract_first(),
        }
        yield forever_item

请帮我谢谢

推荐答案

看来这个分页使用了对 API 的额外请求.所以,有两种方法:

It seems this pagination uses additional request to API. So, there are two ways:

  1. 使用Splash/Selenium按QHarr模式渲染页面;
  2. 对 API 进行相同的调用.检查开发人员工具,您会发现 POST-request https://www.forever21.com/us/shop/Catalog/GetProducts 将所有正确的参数(它们太长,所以我不会发布完整的在这里列出).
  1. Use Splash/Selenium to render pages by pattern of QHarr;
  2. Make same calls to API. Check developer tools, you will find POST-request https://www.forever21.com/us/shop/Catalog/GetProducts will all proper params (they are too long, so I will not post full list here).

这篇关于如何使用 Scrapy 使用带有省略号的下一步按钮抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆