使用 Python 抓取网页时从链接中提取 href [英] Pulling the href from a link when web scraping using Python

查看:33
本文介绍了使用 Python 抓取网页时从链接中提取 href的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从这个页面抓取:https://www.pro-football-reference.com/years/2018/week_1.htm

I am scraping from this page: https://www.pro-football-reference.com/years/2018/week_1.htm

这是美式足球比赛得分列表.我想打开第一场比赛数据的链接.显示的文字说最终".到目前为止,我的代码...

It is a list of game scores for American Football. I want to open the link to the stats for the first game. The text displayed for said says "Final". My code so far...

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


#assigning url
my_url = "https://www.pro-football-reference.com/years/2018/week_1.htm"

# opening up connection, grabbing the page
raw_page = uReq(my_url)
page_html = raw_page.read()
raw_page.close()

# html parsing
page_soup = soup(page_html,"html.parser")

#find all games on page
games = page_soup.findAll("div",{"class":"game_summary expanded nohover"})

link = games[0].find("td",{"class":"right gamelink"})
print(link)

当我运行它时,我收到以下输出...

When I run this i receive the following output...

<a href="/boxscores/201809060phi.htm">Final</a>

如何仅将链接文本(即/boxscores/201809060phi.htm")分配给变量?

How do I assign only the link text (i.e. "/boxscores/201809060phi.htm") to a variable?

推荐答案

link = games[0].find("td",{"class":"right gamelink"}).find('a')

print(link['href'])

这篇关于使用 Python 抓取网页时从链接中提取 href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆