如何使用python从html页面中提取多个值? [英] How do I pull multiple values from html page using python?
问题描述
我正在根据 nhl 点差/投注赔率信息对自己的知识进行一些数据分析.我能够提取一些信息,但不能提取整个数据集.我想将游戏列表和相关内容拉入熊猫数据帧,但我已经能够围绕 html 标签执行正确的循环.我已经尝试了 findAll
选项和 xpath 路由.我也没有成功.
I'm performing some data analysis for my own knowledge from nhl spread/betting odds information. I'm able to pull some information, but Not the entire data set. I want to pull the list of games and the associated into a panda dataframe, but I have been able to perform the proper loop around the html tags. I've tried the findAll
option and the xpath route. I'm not successful with either.
from bs4 import BeautifulSoup
import requests
page_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'
page_response = requests.get(page_link, timeout=5)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")
# Take out the <div> of name and get its value
name_box = page_content.find('div', attrs={'class': 'datarow'})
name = name_box.text.strip()
print (name)
推荐答案
此脚本遍历每个数据行并单独拉出每个项目,然后将它们附加到 Pandas DataFrame 中.
This script goes through each datarow and pulls out each item individually and then appends them into a pandas DataFrame.
from bs4 import BeautifulSoup
import requests
import pandas as pd
page_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'
page_response = requests.get(page_link, timeout=5)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")
# Take out the <div> of name and get its value
tables = page_content.find_all('div', class_='datarow')
# Iterate through rows
rows = []
# Iterate through each datarow and pull out each home/away separately
for table in tables:
# Get time and date
time_and_date_tag = table.find_all('div', attrs={"class": "time"})[0].contents
date = time_and_date_tag[1]
time = time_and_date_tag[-1]
# Get teams
teams_tag = table.find_all('div', attrs={"class": "datacell teams"})[0].contents[-1].contents
home_team = teams_tag[1].text
away_team = teams_tag[-1].text
# Get opening
opening_tag = table.find_all('div', attrs={"class": "child-open"})[0].contents
home_open_value = opening_tag[1]
away_open_value = opening_tag[-1]
# Get current
current_tag = table.find_all('div', attrs={"class": "child-current"})[0].contents
home_current_value = current_tag[1]
away_current_value = current_tag[-1]
# Create list
rows.append([time, date, home_team, away_team,
home_open_value, away_open_value,
home_current_value, away_current_value])
columns = ['time', 'date', 'home_team', 'away_team',
'home_open', 'away_open',
'home_current', 'away_current']
print(pd.DataFrame(rows, columns=columns))
这篇关于如何使用python从html页面中提取多个值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!