从html提取python url [英] python url extract from html
本文介绍了从html提取python url的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要python正则表达式才能从html中提取网址, 示例html代码:
I need python regex to extract url's from html, example html code :
<a href=""http://a0c5e.site.it/r"" target=_blank><font color=#808080>MailUp</font></a>
<a href=""http://www.site.it/prodottiLLPP.php?id=1"" class=""txtBlueGeorgia16"">Prodotti</a>
<a href=""http://www.site.it/terremoto.php"" target=""blank"" class=""txtGrigioScuroGeorgia12"">Terremoto</a>
<a class='mini' href='http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse'>clicca qui.</a>`
我只需要提取:
http://a0c5e.site.it/r
http://www.site.it/prodottiLLPP.php?id=1
http://www.site.it/terremoto.php
http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse
推荐答案
观察
Python 2.7.3 (default, Sep 4 2012, 20:19:03)
[GCC 4.2.1 20070831 patched [FreeBSD]] on freebsd9
Type "help", "copyright", "credits" or "license" for more information.
>>> junk=''' <a href=""http://a0c5e.site.it/r"" target=_blank><font color=#808080>MailUp</font></a>
... <a href=""http://www.site.it/prodottiLLPP.php?id=1"" class=""txtBlueGeorgia16"">Prodotti</a>
... <a href=""http://www.site.it/terremoto.php"" target=""blank"" class=""txtGrigioScuroGeorgia12"">Terremoto</a>
... <a class='mini' href='http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse'>clicca qui.</a>`'''
>>> import re
>>> pat=re.compile(r'''http[\:/a-zA-Z0-9\.\?\=&]*''')
>>> pat.findall(junk)
['http://a0c5e.site.it/r', 'http://www.site.it/prodottiLLPP.php?id=1', 'http://www.site.it/terremoto.php', 'http://www.site.com/remove/professionisti.aspx?Id=65&Code=xhmyskwzse']
可能想添加%,以便您可以捕获其他转义符.
Might want to add % so you can catch other escapes.
这篇关于从html提取python url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文