Python 3从体育网站提取HTML数据 [英] Python 3 extract html data from sports site

查看：127 发布时间：2020/11/24 6:20:15 python html web-scraping beautifulsoup

本文介绍了Python 3从体育网站提取HTML数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在尝试从体育网站中提取数据，但到目前为止一直失败.我正在尝试提取35个射门得分和23个射门，但都失败了.

I have been trying to extract data from a sports site and so far failing. I am Trying to extract the 35, Shots on Goal and 23 but have been failing.

<div class="statTextGroup">
   <div class="statText statText--homeValue">35</div>
   <div class="statText statText--titleValue">Shots on Goal</div>
   <div class="statText statText--awayValue">23</div></div>

from bs4 import BeautifulSoup
import requests

result = requests.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0")
src = result.content

soup = BeautifulSoup(src, 'html.parser')

stats = soup.find("div", {"class": "tab-statistics-0-statistic"})
print(stats)

这是我一直在尝试使用的代码，当我运行它时，我得到无"消息.印给我.有人可以帮我，以便我打印出数据.

This is the code I have been trying to use and when I run it I get "None" printed to me. Could someone help me so I can print out the data.

在此处找到整页: https://www.scoreboard .com/uk/match/lvbns58C/#match-statistics; 0

推荐答案

当网站由javascript呈现时，可能的选项将使用selenium加载页面，然后使用BeautifulSoup对其进行解析:

As the website is rendered by javascript, possible option would load the page using selenium and then parse it with BeautifulSoup:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

# initialize selenium driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('<<PATH_TO_SELENIUMDRIVER>>', options=chrome_options)

# load page via selenium
wd.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0")

# wait 30 seconds until element with class mainGrid will be loaded
table = WebDriverWait(wd, 30).until(EC.presence_of_element_located((By.ID, 'statistics-content')))

# parse content of the table
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'html.parser')

print(soup)

# close selenium driver
wd.quit()

这篇关于Python 3从体育网站提取HTML数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python 3从体育网站提取HTML数据 [英] Python 3 extract html data from sports site

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python 3从体育网站提取HTML数据 [英] Python 3 extract html data from sports site

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭