常规的前pression寻找的'href“属性值&LT; A&GT;链接 [英] regular expression for finding 'href' value of a <a> link
问题描述
我需要在寻找HTML网页链接的正则表达式。
I need a regex pattern for finding web page links in HTML.
我第一次使用 @(小于A *&GT;?。?*&LT; / A&GT;)
来提取链接( &LT;一个方式&gt;
),但我不能获取的href
从
I first use @"(<a.*?>.*?</a>)"
to extract links (<a>
), but I can't fetch href
from that.
我的字符串是:
-
&LT; A HREF =www.example.com/page.php?id=xxxx&name=yyyy......&GT;&LT; / A&GT;
-
&LT; A HREF =http://www.example.com/page.php?id=xxxx&name=yyyy......&GT;&LT; / A&GT;
-
&LT; A HREF =https://www.example.com/page.php?id=xxxx&name=yyyy......&GT;&LT; / A&GT;
-
&LT; A HREF =www.example.com/page.php/404......&GT;&LT; / A&GT;
<a href="www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="http://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="https://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="www.example.com/page.php/404" ....></a>
1,2和3是有效的,我需要他们,但数字4是无效的,我
(?
和 =
是必不可少的)
1, 2 and 3 are valid and I need them, but number 4 is not valid for me
(?
and =
is essential)
谢谢大家,但我并不需要解析&LT; A&GT;
。我有一个链接列表中 HREF =ABCDEF
格式。
Thanks everyone, but I don't need parsing <a>
. I have a list of links in href="abcdef"
format.
我需要获取的href
的链接,并将其过滤,我最喜欢的网址必须包含?
和 =
如 page.php?ID = 5
I need to fetch href
of the links and filter it, my favorite urls must be contain ?
and =
like page.php?id=5
谢谢!
推荐答案
下面是一个将在各个环节中的的href
属性的值创建一个捕获组正则表达式。
Here's a regex that will create a capturing group over the value of the href
attribute of each links.
<a[^>]* href="([^"]*)"
编辑:
由于在评论中提到,在 previous 前pression也将匹配类似&LT;区域HREF =a.html&GT;
。
As mentioned in the comments, the previous expression will also match something like <area href="a.html">
.
下面是修正版本:
<a\s+(?:[^>]*?\s+)?href="([^"]*)"
这篇关于常规的前pression寻找的'href“属性值&LT; A&GT;链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!