无法在div元素beautifulsoup中获取所有span标签 [英] can't get all span tag inside div element beautifulsoup

查看:55
本文介绍了无法在div元素beautifulsoup中获取所有span标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在抓取这个 站点,我需要从中获取工资值,如图像

I am scraping this site and I need to get the salary value from it as shown in the image

我试图做流动:

import requests
from bs4 import BeautifulSoup
result = requests.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo- Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")
page = result.content
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
    print (span)

但我只得到这个范围

<span class="css-wn0avc">Salary<!-- -->:</span>

我的问题是为什么我无法在 div 中获取所有跨度?在这种情况下我应该怎么做才能获得薪水价值?

My question is why I can't get all the span inside the div? And what should I do to get salary value in this case?

推荐答案

由于 Beautiful Soup 只是一个解析器,用于处理您提供的内容,它与页面检索或渲染无关.

Since Beautiful Soup is just a parser that works with the content you provide it with, it has nothing to do with page retrieval or rendering.

我在我的案例中找到的解决方案是使用 selenium 来获取 JS 渲染页面.

The solution that I found in my case is to use selenium to get JS rendered page.

工作代码:

from bs4 import BeautifulSoup
from webdriver_manager import driver
from webdriver_manager.chrome import ChromeDriver, ChromeDriverManager
from selenium import webdriver

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo-Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")

page = driver.page_source
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
    print (span)

这篇关于无法在div元素beautifulsoup中获取所有span标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆