如何抓取通过“查看更多"加载的项目使用 Scrapy 的按钮 [英] How to scrape the items loaded via a "view more" button using Scrapy

查看：25 发布时间：2021/7/16 22:13:28 python xpath web-scraping scrapy

本文介绍了如何抓取通过“查看更多"加载的项目使用 Scrapy 的按钮的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这里是对网站中查看更多按钮的检查.我可以抓取网站中显示的数据，但我希望它能以某种方式抓取隐藏在查看更多"按钮后面的项目.我怎么做?

 
<div id="view-more-loader" class="tac"></div><a href="javascript:void(0);"onclick="add_more_product_classified();$('#load_more_a_id').hide();"class="xxxxlarge ffrc lightbginfo gbiwb bdr darkbdrinfo p10px20px db w180px m0a tac" id="load_more_a_id" style="display: block;"><b class="icon-refresh xsmall mr5px"></b>View更多产品..

我的爬虫代码:

导入scrapy类 DummymartSpider(scrapy.Spider):名称 = 'dummymart'allowed_domains = ['dummymart.net']start_urls =['https://www.dummymart.com/catalog/car-dvd-player_cid100001018.html']定义解析(自我，响应):Product = response.xpath('//div[@class="attr"]/h2/a/@title').extract()Company = response.xpath('//div[@class="supplier"]/p/a/@title').extract()Country = response.xpath('//*[@class="location a-color-secondary"]/span/text()').extract()Category = response.xpath('//*[@class="attr category hide--mobile"]/span/a/text()').extract()对于 zip 中的项目(产品、公司、国家、类别):刮_信息 = {'产品':项目[0]，'公司':项目[1]，'国家':项目[2]，'类别':项目[3]}产量scraped_info

解决方案

针对此类问题的通常解决方案是:

在浏览器中启动开发者工具；
转到网络面板，以便您可以查看浏览器发出的请求；
点击页面中的查看更多"按钮，检查浏览器为获取数据所做的请求；
对您的蜘蛛发出相同的请求.

这篇博文可以帮助你.

Here is the inspection of View more button in a website. I can crawl through data that are shown in the website but I want it somehow that it can crawl through items that are hidden behind the view more button. How do I do that?

 <div id="view-more" class="p20px pt10px">
                        <div id="view-more-loader" class="tac"></div>

                        <a href="javascript:void(0);" onclick="add_more_product_classified();$('#load_more_a_id').hide();" class="xxxxlarge ffrc lightbginfo gbiwb bdr darkbdrinfo p10px20px db w180px m0a tac" id="load_more_a_id" style="display: block;"><b class="icon-refresh xsmall mr5px"></b>View More Products..</a>
                        </div>

My scrapy code:

import scrapy




class DummymartSpider(scrapy.Spider):
    name = 'dummymart'
    allowed_domains = ['dummymart.net']
    start_urls =['https://www.dummymart.com/catalog/car-dvd-player_cid100001018.html']



    def parse(self, response):
            Product = response.xpath('//div[@class="attr"]/h2/a/@title').extract()
            Company =  response.xpath('//div[@class="supplier"]/p/a/@title').extract()
            Country =  response.xpath('//*[@class="location a-color-secondary"]/span/text()').extract()
            Category = response.xpath('//*[@class="attr category hide--mobile"]/span/a/text()').extract()

            for item in zip(Product,Company,Country,Category):
                scraped_info = {
                    'Product':item[0],
                    'Company': item[1],
                    'Country':item[2],
                    'Category':item[3]

                }
                yield scraped_info

解决方案

The usual solution for a problem like this is:

Fire up the Developer Tools in your browser;
Go to the Network panel so that you can view the requests made by your browser;
Click the "view more" button in the page and check which request your browser did to fetch the data;
Make the same request on your spider.

This blog post may help you.

这篇关于如何抓取通过“查看更多"加载的项目使用 Scrapy 的按钮的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何抓取通过“查看更多"加载的项目使用 Scrapy 的按钮 [英] How to scrape the items loaded via a "view more" button using Scrapy

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何抓取通过“查看更多"加载的项目使用 Scrapy 的按钮 [英] How to scrape the items loaded via a &quot;view more&quot; button using Scrapy

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何抓取通过“查看更多"加载的项目使用 Scrapy 的按钮 [英] How to scrape the items loaded via a "view more" button using Scrapy

登录关闭