如何使用 Python 从 HTML 获取 href 链接? [英] How can I get href links from HTML using Python?

查看：58 发布时间：2021/12/23 19:46:10 python html hyperlink beautifulsoup href

本文介绍了如何使用 Python 从 HTML 获取 href 链接?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

import urllib2

website = "WEBSITE"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()

print html

到目前为止一切顺利.

但我只想要纯文本 HTML 中的 href 链接.我怎么解决这个问题?

But I want only href links from the plain text HTML. How can I solve this problem?

推荐答案

尝试使用 Beautifulsoup:

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

如果你只想要以 http:// 开头的链接，你应该使用:

In case you just want links starting with http://, you should use:

soup.findAll('a', attrs={'href': re.compile("^http://")})

在带有 BS4 的 Python 3 中，它应该是:

In Python 3 with BS4 it should be:

from bs4 import BeautifulSoup
import urllib.request

html_page = urllib.request.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page, "html.parser")
for link in soup.findAll('a'):
    print(link.get('href'))

这篇关于如何使用 Python 从 HTML 获取 href 链接?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 Python 从 HTML 获取 href 链接? [英] How can I get href links from HTML using Python?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用 Python 从 HTML 获取 href 链接? [英] How can I get href links from HTML using Python?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭