如何从HTML页面获取链接 [英] How to grab links from an HTML page
问题描述
大家好,
当天的问候!
我担心的是,我正在尝试获取HTML页面中的所有链接.我尝试按照以下步骤操作: http://www.dotnetperls.com/scraping-html [ ^ ]和许多其他链接,但找不到合适的解决方案.问题是我非常无法在我的IDE中找到"LinkItem"对象.也尝试使用System.Net和System.Diagnostics命名空间.
请建议我如何摆脱这个小问题,或者可能是我不知道的专业.
感谢您的帮助.
谢谢,
Sunny K
Hi Everyone,
Greetings for the day!
My concern is I''m trying to grab all the links in an HTML page. I''ve tried following this: http://www.dotnetperls.com/scraping-html[^] and many other links but couldn''t find a suitable solution. The problem is I''m quite unable to find a "LinkItem" object in my IDE.. also tried using System.Net and System.Diagnostics namespaces.
Please suggest me how can I get rid of this minor problem or may be a major I don''t know.
Any Help is appreciated.
Thanks,
Sunny K
推荐答案
您好,在页面上和Button_click上添加文本框,标签和按钮,尝试如下:
hello, Add a TextBox, label and a Button on your page and on Button_click try like this:
protected void Button1_Click(object sender, EventArgs e)
{
string url = TextBox1.Text;
WebClient wc = new WebClient();
string html = wc.DownloadString(url);
ArrayList linkCount = CollectLinks(html);
StringBuilder sb = new StringBuilder();
int c = 1;
sb.Append("<table> <tr> <td style="padding-right:40px;"> # </td> <td> URL </td> </tr> ");
foreach (var item in linkCount)
{
sb.Append(" <tr> <td> " + c.ToString() + " </td> <td> " + item.ToString() + " </td> </tr>");
c++;
}
sb.Append("</table>");
lblResult.Text = sb.ToString();
}
public ArrayList CollectLinks(string strSource)
{
ArrayList ar = new ArrayList();
try
{
Regex r1 = new Regex("((http://|www\\.)([A-Z0-9.-:]{1,})\\.[0-9A-Z?;~:&+%#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);
MatchCollection mc = r1.Matches(strSource);
foreach (Match m in mc)
{
ar.Add(m);
}
}
catch (Exception exp)
{
}
return ar;
}
并且所有链接将显示在标签中
and all links will show in the lable
使用document.getElementByTagName(''a'')使用javascript
use document.getElementByTagName(''a'') using javascript
这篇关于如何从HTML页面获取链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!