正则表达式来解析从HTML代码的链接 [英] regular expression to parse links from html code

查看:147
本文介绍了正则表达式来解析从HTML代码的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:结果
正则表达式来获得在HREF的链接。 [asp.net]

我正在接受字符串(HTML代码)的方法和返回数组包含所有包含在的链接。

I'm working on a method that accepts a string (html code) and returns an array that contains all the links contained with in.

我见过的东西像HTML能力包几个选项,但是这似乎有点复杂得多,这个项目要求

I've seen a few options for things like html ability pack but It seems a little more complicated than this project calls for

我也有兴趣在使用正则表达式,因为我没有一般与它太多的经验,我认为这将是一个很好的学习机会。

I'm also interested in using regular expression because i don't have much experience with it in general and i think this would be a good learning opportunity.

我的代码迄今是

 WebClient client = new WebClient();
            string htmlCode = client.DownloadString(p);
            Regex exp = new Regex(@"http://(www\.)?([^\.]+)\.com", RegexOptions.IgnoreCase);
            string[] test = exp.Split(htmlCode);



但我没有得到我想要,因为我仍然在正则表达式工作的结果

but I'm not getting the results I want because I'm still working on the regular expression

什么我要找的是须藤代码

sudo code for what I'm looking for is "

推荐答案

如果你正在寻找一个万无一失的解决方案正则表达式是不是你的答案。他们是从根本上有限的,不能用来可靠地解析出链接,或与此有关的其他标记,从HTML文件,由于HTML语言的复杂性。

If you are looking for a fool proof solution regular expressions are not your answers. They are fundamentally limited and cannot be used to reliably parse out links, or other tags for that matter, from an HTML file due to the complexity of the HTML language.

  • Long Winded Version: http://blogs.msdn.com/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx

相反,你需要使用一个实际的HTML DOM API来解析出的链接。

Instead you'll need to use an actual HTML DOM API to parse out links.

这篇关于正则表达式来解析从HTML代码的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆