无法在div元素beautifulsoup中获取所有span标签 [英] can't get all span tag inside div element beautifulsoup
本文介绍了无法在div元素beautifulsoup中获取所有span标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
I am scraping this site and I need to get the salary value from it as shown in the image
我试图做流动:
import requests
from bs4 import BeautifulSoup
result = requests.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo- Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")
page = result.content
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
print (span)
但我只得到这个范围
<span class="css-wn0avc">Salary<!-- -->:</span>
我的问题是为什么我无法在 div 中获取所有跨度?在这种情况下我应该怎么做才能获得薪水价值?
My question is why I can't get all the span inside the div? And what should I do to get salary value in this case?
推荐答案
由于 Beautiful Soup 只是一个解析器,用于处理您提供的内容,它与页面检索或渲染无关.
Since Beautiful Soup is just a parser that works with the content you provide it with, it has nothing to do with page retrieval or rendering.
我在我的案例中找到的解决方案是使用 selenium 来获取 JS 渲染页面.
The solution that I found in my case is to use selenium to get JS rendered page.
工作代码:
from bs4 import BeautifulSoup
from webdriver_manager import driver
from webdriver_manager.chrome import ChromeDriver, ChromeDriverManager
from selenium import webdriver
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo-Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")
page = driver.page_source
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
print (span)
这篇关于无法在div元素beautifulsoup中获取所有span标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文