如何从HTML页面获取链接 [英] How to grab links from an HTML page

查看:171
本文介绍了如何从HTML页面获取链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,
当天的问候!

我担心的是,我正在尝试获取HTML页面中的所有链接.我尝试按照以下步骤操作: http://www.dotnetperls.com/scraping-html [ ^ ]和许多其他链接,但找不到合适的解决方案.问题是我非常无法在我的IDE中找到"LinkItem"对象.也尝试使用System.Net和System.Diagnostics命名空间.

请建议我如何摆脱这个小问题,或者可能是我不知道的专业.

感谢您的帮助.

谢谢,
Sunny K

Hi Everyone,
Greetings for the day!

My concern is I''m trying to grab all the links in an HTML page. I''ve tried following this: http://www.dotnetperls.com/scraping-html[^] and many other links but couldn''t find a suitable solution. The problem is I''m quite unable to find a "LinkItem" object in my IDE.. also tried using System.Net and System.Diagnostics namespaces.

Please suggest me how can I get rid of this minor problem or may be a major I don''t know.

Any Help is appreciated.

Thanks,
Sunny K

推荐答案

您好,在页面上和Button_click上添加文本框,标签和按钮,尝试如下:
hello, Add a TextBox, label and a Button on your page and on Button_click try like this:
protected void Button1_Click(object sender, EventArgs e)
   {
       string url = TextBox1.Text;
       WebClient wc = new WebClient();

       string html = wc.DownloadString(url);

       ArrayList linkCount = CollectLinks(html);

       StringBuilder sb = new StringBuilder();
       int c = 1;
       sb.Append("<table> <tr>  <td style="padding-right:40px;">   #   </td>   <td>  URL </td> </tr> ");
       foreach (var item in linkCount)
       {
           sb.Append(" <tr>  <td>    " + c.ToString() + "   </td>   <td>  " + item.ToString() + " </td> </tr>");
           c++;
       }
       sb.Append("</table>");

       lblResult.Text = sb.ToString();
   }

   public ArrayList CollectLinks(string strSource)
   {
       ArrayList ar = new ArrayList();
       try
       {
           Regex r1 = new Regex("((http://|www\\.)([A-Z0-9.-:]{1,})\\.[0-9A-Z?;~:&+%#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);
           MatchCollection mc = r1.Matches(strSource);
           foreach (Match m in mc)
           {
               ar.Add(m);
           }
       }
       catch (Exception exp)
       {

       }

       return ar;
   }



并且所有链接将显示在标签中



and all links will show in the lable


使用document.getElementByTagName(''a'')使用javascript
use document.getElementByTagName(''a'') using javascript


这篇关于如何从HTML页面获取链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆