无法使用BeautifulSoup抓取数据 [英] Not able to Scrape data using BeautifulSoup

查看:46
本文介绍了无法使用BeautifulSoup抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Selenium登录该网页并获取该网页以进行抓取我能够得到该页面.我在html上搜索了要抓取的表.在这里是:-

I'm using Selenium to login to the webpage and getting the webpage for scraping I'm able to get the page. I have searched the html for a table that I wanted to scrape. here it is:-

<table cellspacing="0" class=" tablehasmenu table hoverable sensors" id="table_devicesensortable">

这是脚本:-

rawpage=driver.page_source #storing the webpage in variable
souppage=BeautifulSoup(rawpage,'html.parser') #parsing the webpage
tbody=souppage.find('table', attrs={'id':'table_devicesensortable'}) #scrapping

我能够在souppage变量中获取已解析的网页.但无法抓取并存储在tbody变量中.

I'm able to get the parsed webpage in souppage variable. but not able to scrape and store in tbody variable.

推荐答案

必需表可能是动态生成的,因此您需要等到它出现在页面上:

Required table might be generated dynamically, so you need to wait until its presence on page:

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait

tbody = wait(driver, 10).until(EC.presence_of_element_located((By.ID, "table_devicesensortable")))

还请注意,无需使用BeautifulSoup,因为Selenium具有足够的内置方法和属性来为您完成相同的工作,例如

Also note that there is no need in using BeautifulSoup as Selenium has enough built-in methods and properties to do the same job for you, e.g.

headers = tbody.find_elements_by_tag_name("th")
rows = tbody.find_elements_by_tag_name("tr")
cells = tbody.find_elements_by_tag_name("td")
cell_values = [cell.text for cell in cells]
etc...

这篇关于无法使用BeautifulSoup抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆