如何使用 Selenium 解析来自网站的表格数据? [英] How can I parse table data from website using Selenium?

查看:47
本文介绍了如何使用 Selenium 解析来自网站的表格数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析 [网站][1] 中存在的表格

Im trying to parse the table present in the [website][1]

[1]:http://www.espncricinfo.com/rankings/content/page/211270.html 使用 selenium,因为我是初学者.我正在努力做到这一点,这是我的代码

[1]: http://www.espncricinfo.com/rankings/content/page/211270.html using selenium, as I am beginner . i'm struggling to do that here is my code

from bs4 import BeautifulSoup
import time
from selenium import webdriver

url = "http://www.espncricinfo.com/rankings/content/page/211270.html"
browser = webdriver.Chrome()

browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")

print(len(soup.find_all("table")))
print(soup.find("table", {"class": "expanded_standings"}))

browser.close()
browser.quit()

我试过了,我无法从中获取任何内容,任何建议都会非常有帮助,谢谢

that I tried, I'm unable to fetch anything from this, any suggestions will be really helpful thanks

推荐答案

您要查找的表位于 iframe 内.因此,要从该表中获取数据,您需要先切换 iframe,然后再执行其余操作.这是您可以做到的一种方法:

The table you are after is within an iframe. So, to get the data from that table you need to switch that iframe first and then do the rest. Here is one way you could do it:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
wait = WebDriverWait(driver, 10)
 ## if any different table you expect to have then just change the index number within nth-of-type()
 ## and the appropriate name in the selector
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[name='testbat']:nth-of-type(1)")))
for table in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table tr")))[1:]:
    data = [item.text for item in table.find_elements_by_css_selector("th,td")]
    print(data)
driver.quit()

在这种情况下最好的方法如下.没有使用浏览器模拟器.仅使用了 requestsBeautifulSoup:

And the best approach would be in this very case is as follows. No browser simulator is used. Only requests and BeautifulSoup have been used:

import requests
from bs4 import BeautifulSoup

res = requests.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
soup = BeautifulSoup(res.text,"lxml")
 ## if any different table you expect to have then just change the index number 
 ## and the appropriate name in the selector
item = soup.select("iframe[name='testbat']")[0]['src']
req = requests.get(item)
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select("table tr"):
    data = [item.text for item in items.select("th,td")]
    print(data)

部分结果:

['Rank', 'Name', 'Country', 'Rating']
['1', 'S.P.D. Smith', 'AUS', '947']
['2', 'V. Kohli', 'IND', '912']
['3', 'J.E. Root', 'ENG', '881']

这篇关于如何使用 Selenium 解析来自网站的表格数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆