解析来自html源的链接 [英] parse links from html source

查看:65
本文介绍了解析来自html源的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从下面显示的源中解析链接(link1.php,link2.php ...),例如在列表框中添加链接?
我只想解析"span class ="visitext"> VISIT"
处的链接 我尝试使用功能GetElementsByTagName,GetElementById和GetAttribute,但是根本不知道如何添加条件"VISIT"

how to parse links (link1.php, link2.php...) from the source shown below add them for example in a listbox?
i want to parse links only where the "span class="visitext">VISIT"
i tried with functions GetElementsByTagName, GetElementById and GetAttribute but simply do not know how to add the condition "VISIT"

<span class="visit" style="float: left; padding-removed 10px;"><a href="link1.php" target="_blank"><span class="ui-icon ui-icon-circle-triangle-e" style="float: left;"></span> <span class="visitext">VISIT</span></a></span>
<span class="visit" style="float: left; padding-removed 10px;"><a href="link2.php" target="_blank"><span class="ui-icon ui-icon-circle-triangle-e" style="float: left;"></span> <span class="visitext">VISITED</span></a></span>
<span class="visit" style="float: left; padding-removed 10px;"><a href="link3.php" target="_blank"><span class="ui-icon ui-icon-circle-triangle-e" style="float: left;"></span> <span class="visitext">VISITED</span></a></span>
<span class="visit" style="float: left; padding-removed 10px;"><a href="link4.php" target="_blank"><span class="ui-icon ui-icon-circle-triangle-e" style="float: left;"></span> <span class="visitext">VISIT</span></a></span>

推荐答案

嗯,在我看来,简单的字符串混搭将是可行的,因为该链接始终位于同一位置,但是我会说常规地,表达式是最健壮的方法.
Well, it seems to me like simple string mashing will work, given that the link is always in the same place, but I''d say reguarly expressions are the most robust way to do it.


如果您知道文档格式良好的XHTML,则可以使用XPath和/或LINQ.抱歉,我不了解VB,但希望这是不言而喻的.

使用系统;
使用System.Xml;
使用System.Xml.Linq;
使用System.Xml.XPath;

...

var xhtml =将您的xhtml放在这里";
var xml = XDocument.Parse(xhtml);
var visitNodes = xml.XPathSelectElements("//span [@class ="''visit'']/a/@ href);
var links =从visitNodes中的n选择n.Value;

foreach(链接中的var l)
Console.WriteLine(l);


如果您仍然使用.NET 2.0或更旧的版本,则可以使用XmlDocument而不是XDocument用几行代码来完成相同的事情.

如果HTML格式不正确,则需要使用正则表达式. 这是一个很好的起点. [
If you know the document is well-formed XHTML you can use XPath and/or LINQ. Sorry, I don''t know VB, but hopefully this is self-explanatory.

using System;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;

...

var xhtml = "put your xhtml here";
var xml = XDocument.Parse(xhtml);
var visitNodes = xml.XPathSelectElements("//span[@class=''visit'']/a/@href");
var links = from n in visitNodes select n.Value;

foreach(var l in links)
Console.WriteLine(l);


If you''re stuck with .NET 2.0 or older, you can do the same thing with a few more lines of code using XmlDocument instead of XDocument.

If the HTML isn''t well-formed, you''ll need to use regular expressions. Here''s a good starting point.[^]


这篇关于解析来自html源的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆