Beautifulsoup - 抓取网页 - 动态加载页面 [英] Beautifulsoup - scrape webpage - dynamically loading page

查看：252 发布时间：2021/6/14 19:36:34 python parsing web-scraping beautifulsoup screen-scraping

本文介绍了Beautifulsoup - 抓取网页 - 动态加载页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想抓取一个网页:https://www.justdial.com/Mumbai/Dairy-Product-Retailers-in-Thane/nct-10152687

我需要所有商店名称、电话号码和地址的数据

i need a data of all store name, tel- num and their address

但我最多只能做 10导致加载您需要滚动网页的其他项目

But i can only do it upto 10 cause to load other items you need to scroll the webpage

我的代码:

import requests
import bs4

crawl_url = requests.get('https://www.justdial.com/Mumbai/Dairy-Product-
Retailers-in-Thane/nct-10152687', headers={'User-Agent': 'Mozilla/5.0'})
crawl_url.raise_for_status()


soup = bs4.BeautifulSoup(crawl_url.text, 'lxml')

for elems in soup.find_all('span', class_="jcn"):
    select_a = elems.select('a')
    for links in select_a:
        href = links.get('href')
        res = requests.get(href, headers={'User-Agent': 'Mozilla/5.0'})

        xsoup = bs4.BeautifulSoup(res.text, 'lxml')

        Name = xsoup.select('.fn')
        tel = xsoup.select('.tel')
        add = xsoup.select('.adrstxtr')
        a = Name[0]
        b = tel[0]
        c = add[0]
        print(a.getText())
        print("--"*10)
        print(b.getText())
        print("--"*10)
        print(c.getText())
        print("=="*25)

当我们向下滚动页面时，其他项目会加载所以我想知道如何获得我想要的任意数量的数据/项目

When We Scroll Down the Page other Items Load up So i Want to Know How can get any numbers of Data/Items I want

我试过这个

但是没有安静的理解好吧，我也没有得到那个POST方法:/

But didn't quiet understood Well, and also i did not got that POST method :/

如果需要更多信息告诉我

If need more Info Tell me

推荐答案

t.m.adam 给出的解决方案有效这是代码

Solution given by t.m.adam worked Here is Code

import requests
import bs4

def spider(max_pages):
    page = 1
    while page < max_pages:
        url = "https://www.justdial.com/Mumbai/Dairy-Product-Retailers-in-
Thane/nct-10152687/page-%s" % page
        crawl_url = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
        crawl_url.raise_for_status()
        soup = bs4.BeautifulSoup(crawl_url.text, 'lxml')
        for elems in soup.find_all('span', class_="jcn"):
            select_a = elems.select('a')
            for links in select_a:
               href = links.get('href')
               res = requests.get(href, headers={'User-Agent': 
'Mozilla/5.0'})
                xsoup = bs4.BeautifulSoup(res.text, 'lxml')
                Name = xsoup.select('.fn')
                tel = xsoup.select('.tel')
                add = xsoup.select('.adrstxtr')
                a = Name[0]
                b = tel[0]
                c = add[0]
                print(a.getText())
                print("--"*10)
                print(b.getText())
                print("--"*10)
                print(c.getText())
                print("=="*25)
        page += 1


spider(3)

这篇关于Beautifulsoup - 抓取网页 - 动态加载页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Beautifulsoup - 抓取网页 - 动态加载页面 [英] Beautifulsoup - scrape webpage - dynamically loading page

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Beautifulsoup - 抓取网页 - 动态加载页面 [英] Beautifulsoup - scrape webpage - dynamically loading page

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭