单击后显示时如何不使用python刮手机 [英] How to scrape phone no using python when it show after clicked

查看:19
本文介绍了单击后显示时如何不使用python刮手机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取电话号码但电话号码只有在点击后才显示所以请问是否可以直接使用python抓取电话号码?我的代码抓取电话号码但使用starr***.这是我想从中获取电话号码的链接:https://hipages.com.au/connect/abcelectricservicespl/service/126298 请指导我!这是我的代码:

I want to scrape phone no but phone no only displays after clicked so please is it possible to scrape phone no directly using python?My code scrape phone no but with starr***. here is the link from where I want to scrape phone no:https://hipages.com.au/connect/abcelectricservicespl/service/126298 please guide me! here is my code:

import requests
from bs4 import BeautifulSoup


def get_page(url):
    response = requests.get(url)

    if not response.ok:
        print('server responded:', response.status_code)
    else:
        soup = BeautifulSoup(response.text, 'lxml')
    return soup

def get_detail_data(soup):

    try:
        title = (soup.find('h1', class_="sc-AykKI",id=False).text)
    except:
        title = 'Empty Title'
    print(title)

    try:
        contact_person = (soup.findAll('span', class_="Contact__Item-sc-1giw2l4-2 kBpGee",id=False)[0].text)
    except:
        contact_person = 'Empty Person'
    print(contact_person)

    try:
        location = (soup.findAll('span', class_="Contact__Item-sc-1giw2l4-2 kBpGee",id=False)[1].text)
    except:
        location = 'Empty location'
    print(location)

    try:
        cell = (soup.findAll('span', class_="Contact__Item-sc-1giw2l4-2 kBpGee",id=False)[2].text)
    except:
        cell = 'Empty Cell No'
    print(cell)

    try:
        phone = (soup.findAll('span', class_="Contact__Item-sc-1giw2l4-2 kBpGee",id=False)[3].text)
    except:
        phone = 'Empty Phone No'
    print(phone)

    try:
        Verify_ABN = (soup.find('p', class_="sc-AykKI").text)
    except:
        Verify_ABN = 'Empty Verify_ABN'
    print(Verify_ABN)

    try:
        ABN = (soup.find('div', class_="box__Box-sc-1u3aqjl-0").find('a'))
    except:
        ABN = 'Empty ABN'
    print(ABN)



def main():
    #get data of detail page
    url = "https://hipages.com.au/connect/abcelectricservicespl/service/126298"
    #get_page(url)
    get_detail_data(get_page(url))



if __name__ == '__main__':
    main()

推荐答案

import requests
from bs4 import BeautifulSoup
import re


def Main():
    r = requests.get(
        "https://hipages.com.au/connect/abcelectricservicespl/service/126298")
    soup = BeautifulSoup(r.text, 'html.parser')
    name = soup.find("h1", {'class': 'sc-AykKI'}).text
    print(name)
    person = soup.find(
        "span", {'class': 'Contact__Item-sc-1giw2l4-2 kBpGee'}).text.strip()
    print(person)
    addr = soup.findAll(
        "span", {'class': 'Contact__Item-sc-1giw2l4-2 kBpGee'})[1].text
    print(addr)
    print(re.search('phone\\":\\"(.*?)\\"', r.text).group(1))
    print(re.search('mobile\\":\\"(.*?)\\"', r.text).group(1))
    print(re.search('abn\\":\\"(.*?)\\"', r.text).group(1))
    print(re.search('website\\":\\"(.*?)\\"', r.text).group(1))


Main()

输出:

ABC Electric Services p/l
Mal
222 Henry Lawson DRV, Georges Hall NSW 2198
1800 801 828
0408 600 950
37137808989
www.abcelectricservices.com.au

或者如果你想解析完整的脚本:

Or if you would like to parse the full script:

import requests
from bs4 import BeautifulSoup
import pyjsparser
import json
import re


def Main():
    r = requests.get(
        "https://hipages.com.au/connect/abcelectricservicespl/service/126298")
    soup = BeautifulSoup(r.text, 'html.parser')
    phone = soup.findAll("script")[5]
    tree = pyjsparser.parse(phone.text)
    print(json.loads(tree["body"][0]["expression"]["right"]["value"]))


Main()

另一个版本:

import requests
from bs4 import BeautifulSoup
import re
import json


def Main():
    r = requests.get(
        "https://hipages.com.au/connect/abcelectricservicespl/service/126298")
    soup = BeautifulSoup(r.text, 'html.parser')
    data = soup.findAll("script")[5].text
    source = re.search(r'__INITIAL_STATE__s*=s*"({.*})', data).group(1)
    kuku = json.loads(re.sub('(?<!\)\\"', '"', source))
    print(json.dumps(kuku, indent=4))


Main()

这篇关于单击后显示时如何不使用python刮手机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆