如何使用 Scrapy 使用带有省略号的下一步按钮抓取数据 [英] How to scrape data using next button with ellipsis using Scrapy
本文介绍了如何使用 Scrapy 使用带有省略号的下一步按钮抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要不断获取下一个按钮上的数据 <1 2 3 ... 5> 但源中没有提供 href 链接,也有省略号.有什么想法吗?这是我的代码
I need to continuously get the data on next button <1 2 3 ... 5> but there's no provided href link in the source also there's also elipsis. any idea please? here's my code
def start_requests(self):
urls = (
(self.parse_2, 'https://www.forever21.com/us/shop/catalog/category/f21/sale'),
)
for cb, url in urls:
yield scrapy.Request(url, callback=cb)
def parse_2(self, response):
for product_item_forever in response.css('div.pi_container'):
forever_item = {
'forever-title': product_item_forever.css('p.p_name::text').extract_first(),
'forever-regular-price': product_item_forever.css('span.p_old_price::text').extract_first(),
'forever-sale-price': product_item_forever.css('span.p_sale.t_pink::text').extract_first(),
'forever-photo-url': product_item_forever.css('img::attr(data-original)').extract_first(),
'forever-description-url': product_item_forever.css('a.item_slider.product_link::attr(href)').extract_first(),
}
yield forever_item
请帮我谢谢
推荐答案
看来这个分页使用了对 API 的额外请求.所以,有两种方法:
It seems this pagination uses additional request to API. So, there are two ways:
- 使用Splash/Selenium按QHarr模式渲染页面;
- 对 API 进行相同的调用.检查开发人员工具,您会发现 POST-request
https://www.forever21.com/us/shop/Catalog/GetProducts
将所有正确的参数(它们太长,所以我不会发布完整的在这里列出).
- Use Splash/Selenium to render pages by pattern of QHarr;
- Make same calls to API. Check developer tools, you will find POST-request
https://www.forever21.com/us/shop/Catalog/GetProducts
will all proper params (they are too long, so I will not post full list here).
这篇关于如何使用 Scrapy 使用带有省略号的下一步按钮抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文