我如何使用Python从HTML获取href链接? [英] How can I get href links from HTML using Python?

查看:1160
本文介绍了我如何使用Python从HTML获取href链接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  import urllib2 
$ b website =WEBSITE
openwebsite = urllib2.urlopen(网站)
html = getwebsite.read()

print html

到目前为止这么好。

但我只需要纯文本HTML的href链接。我怎么解决这个问题?

noreferrer> Beautifulsoup :

  from BeautifulSoup import BeautifulSoup 
import urllib2
import re

html_page = urllib2.urlopen(http://www.yourwebsite.com)
soup = BeautifulSoup(html_page)
for soup.findAll('a')中的链接:
print link.get('href')

http:// ,您应该使用:

  soup.findAll ('a',attrs = {'href':re.compile(^ http://)})


import urllib2

website = "WEBSITE"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()

print html

So far so good.

But I want only href links from the plain text HTML. How can I solve this problem?

解决方案

Try with Beautifulsoup:

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

In case you just want links starting with http://, you should use:

soup.findAll('a', attrs={'href': re.compile("^http://")})

这篇关于我如何使用Python从HTML获取href链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆