如何使用 Selenium 解析来自网站的表格数据? [英] How can I parse table data from website using Selenium?
问题描述
我正在尝试解析 [网站][1] 中存在的表格
Im trying to parse the table present in the [website][1]
[1]:http://www.espncricinfo.com/rankings/content/page/211270.html 使用 selenium,因为我是初学者.我正在努力做到这一点,这是我的代码
[1]: http://www.espncricinfo.com/rankings/content/page/211270.html using selenium, as I am beginner . i'm struggling to do that here is my code
from bs4 import BeautifulSoup
import time
from selenium import webdriver
url = "http://www.espncricinfo.com/rankings/content/page/211270.html"
browser = webdriver.Chrome()
browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")
print(len(soup.find_all("table")))
print(soup.find("table", {"class": "expanded_standings"}))
browser.close()
browser.quit()
我试过了,我无法从中获取任何内容,任何建议都会非常有帮助,谢谢
that I tried, I'm unable to fetch anything from this, any suggestions will be really helpful thanks
推荐答案
您要查找的表位于 iframe
内.因此,要从该表中获取数据,您需要先切换 iframe
,然后再执行其余操作.这是您可以做到的一种方法:
The table you are after is within an iframe
. So, to get the data from that table you need to switch that iframe
first and then do the rest. Here is one way you could do it:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
wait = WebDriverWait(driver, 10)
## if any different table you expect to have then just change the index number within nth-of-type()
## and the appropriate name in the selector
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[name='testbat']:nth-of-type(1)")))
for table in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table tr")))[1:]:
data = [item.text for item in table.find_elements_by_css_selector("th,td")]
print(data)
driver.quit()
在这种情况下最好的方法如下.没有使用浏览器模拟器.仅使用了 requests
和 BeautifulSoup
:
And the best approach would be in this very case is as follows. No browser simulator is used. Only requests
and BeautifulSoup
have been used:
import requests
from bs4 import BeautifulSoup
res = requests.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
soup = BeautifulSoup(res.text,"lxml")
## if any different table you expect to have then just change the index number
## and the appropriate name in the selector
item = soup.select("iframe[name='testbat']")[0]['src']
req = requests.get(item)
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select("table tr"):
data = [item.text for item in items.select("th,td")]
print(data)
部分结果:
['Rank', 'Name', 'Country', 'Rating']
['1', 'S.P.D. Smith', 'AUS', '947']
['2', 'V. Kohli', 'IND', '912']
['3', 'J.E. Root', 'ENG', '881']
这篇关于如何使用 Selenium 解析来自网站的表格数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!