当我按类搜索时,BeautifulSoup的findAll为什么返回一个空列表? [英] Why is BeautifulSoup's findAll returning an empty list when I search by class?

查看:127
本文介绍了当我按类搜索时,BeautifulSoup的findAll为什么返回一个空列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用h2标签进行网页抓取,但是BeautifulSoup返回了一个空列表.

I am trying to web-scrape using an h2 tag, but BeautifulSoup returns an empty list.

<h2 class="iCIMS_InfoMsg iCIMS_InfoField_Job">

html=urlopen("https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job")
bs0bj=BeautifulSoup(html,"lxml")
nameList=bs0bj.findAll("h2",{"class":"iCIMS_InfoMsg iCIMS_InfoField_Job"})
print(nameList)

推荐答案

内容位于iframe中,并通过js更新(因此在初始请求中不存在).您可以使用页面用于获取iframe内容的同一链接(iframe src ).然后从具有信息的脚本标签中提取字符串,并加载 json ,提取 description (是html),然后传递回bs,然后选择 h2 标签.现在,如果需要,您现在还将剩余的信息存储在第二个汤对象中.

The content is inside an iframe and updated via js (so not present in initial request). You can use the same link the page is using to obtain iframe content (the iframe src). Then extract the string from the script tag that has the info and load with json, extract the description (which is html) and pass back to bs to then select the h2 tags. You now have the rest of the info stored in the second soup object as well if required.

import requests
from bs4 import BeautifulSoup as bs
import json

r = requests.get('https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job?mobile=false&width=1140&height=500&bga=true&needsRedirect=false&jan1offset=0&jun1offset=60&in_iframe=1')
soup = bs(r.content, 'lxml')
script = soup.select_one('[type="application/ld+json"]').text
data = json.loads(script)
soup = bs(data['description'], 'lxml')
headers = [item.text for item in soup.select('h2')]
print(headers)


这篇关于当我按类搜索时,BeautifulSoup的findAll为什么返回一个空列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆