隐藏的电话号码无法抓取 [英] Hidden phone number can't be scraped

查看:44
本文介绍了隐藏的电话号码无法抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在点击llamar"后尝试提取电话号码时遇到问题按钮.到目前为止,我已经将 xpath 方法与 selenium 一起使用,并尝试使用美丽的汤来提取数字,但不幸的是没有任何效果.我通常会收到一个无效的选择器错误(如果我使用带有 selenium 的 xpath 选择器)并且使用 BS4 我会得到一个 - AttributeError: 'NoneType' object has no attribute 'text' ...希望你能帮帮我!

I've been having trouble trying to extract the phone number after clicking the "llamar" button. So far I've used the xpath method with selenium and also tried using beautiful soup to extract the number but unfortunately nothing has worked. I usually get an invalid selector error (if I use an xpath selector with selenium) and with BS4 I get a - AttributeError: 'NoneType' object has no attribute 'text' ... I hope you can help me out!

这是链接的网址 - https://www.milanuncios.com/venta-de-pisos-en-malaga-malaga/portada-alta-carlos-de-haya-carranque-386352344.htm

这是我试过的代码:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import UnexpectedAlertPresentException

url = 'https://www.milanuncios.com/venta-de-pisos-en-malaga-malaga/portada-alta-carlos-de-haya-carranque - 386352344.htm'
path = r'C:\Users\WL-133\anaconda3\Lib\site-packages\selenium\webdriver\chrome\chromedriver.exe'
path1 = r'C:\Users\WL-133\anaconda3\Lib\site-packages\selenium\webdriver\firefox'
# driver = webdriver.Chrome(path)
options = Options()
driver = webdriver.Chrome(path)
driver.get(url)

a = []

mah_div = driver.page_source
soup = BeautifulSoup(mah_div, features='lxml')

cookie_button = '//*[@id="sui-TcfFirstLayerModal"]/div/div/footer/div/button[2]'
btn_press = driver.find_element_by_xpath(cookie_button)
btn_press.click()

llam_button = '//*[@id="ad-detail-contact"]/a[2]'
llam_press = driver.find_element_by_xpath(llam_button)
llam_press.click()
time.sleep(10)

for item in soup.find_all("div", {"class": "contenido"}):
    a.append(item.find("div", {"class": "plaincontenido"}).text)

print(a)

推荐答案

使用 Selenium,您需要单击按钮并切换到 iframe.

With Selenium you will need to click the button and to switch to iframe.

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

wait.until(EC.element_to_be_clickable(
            (By.CSS_SELECTOR, ".def-btn.phone-btn")))
tel_button = driver.find_element_by_css_selector(".def-btn.phone-btn")
tel_button.click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "ifrw")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,".texto>.telefonos")))
tel_number = driver.find_element_by_css_selector(".texto>.telefonos").text

请注意,我使用了很多稳定的定位器.

Please note, I used much stable locators.

这篇关于隐藏的电话号码无法抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆