使用vb.net在标签之间提取文本 [英] Extrach text between tags using vb.net

查看:91
本文介绍了使用vb.net在标签之间提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在以下标记之间提取MyText

< span class ="word">
< a href =#0"> MyText</a>
</span>

我要提取MyText. MyText有多次出现.
这里href的值也会改变.像href =#1",href =#2",href =#40"

在此先感谢

How to extract MyText between the below tags

<span class="word">
<a href="#0">MyText</a>
</span>

I want to extract MyText. MyText has multiple occurrences.
Here value of href also changes. like href="#1" , href="#2" ,href="#40"

Thanks in advance

推荐答案

第一个也是最好的选择,如果您可以依靠HTML代码是格式良好的XML的事实,那么.NET可以很好地支持它.选择一种不同的方式:

First and best option is if you can rely upon the fact that an HTML code is well-formed XML which is well supported by .NET. Choose one different ways:


  1. 使用System.Xml.XmlDocument类.它实现了DOM接口;如果文档太大,则这种方法最简单,也足够好.
    请参见
  2. 使用类System.Xml.XmlTextReader; library/system.xml.xmldocument.aspx"target =" _ blank"title =" New Window> ^ ].
  3. 使用类System.Xml.XmlTextReader;这是最快的读取方法,尤其是您需要跳过一些数据.
    请参见 http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx [ http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx [http://msdn.microsoft.com/en-us/library/bb387063.aspx [

  1. Use System.Xml.XmlDocument class. It implements DOM interface; this way is the easiest and good enough if the size if the document is not too big.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^].
  2. Use the class System.Xml.XmlTextReader; this is the fastest way of reading, especially is you need to skip some data.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx[^].
  3. Use the class System.Xml.Linq.XDocument; this is the most adequate way similar to that of XmlDocument, supporting LINQ to XML Programming.
    See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^], http://msdn.microsoft.com/en-us/library/bb387063.aspx[^].



如果不是这种情况,很不幸很有可能,您将需要一些HTML解析器来处理格式不正确的XML代码.试试这个:
http://www.majestic12.co.uk/projects/html_parser.php [ ^ ].

—SA



If this is not the case, which is unfortunately very likely, you will need some HTML parser which can deal with the code which is not well-formed as XML. Try this one: http://www.majestic12.co.uk/projects/html_parser.php[^].

—SA


这篇关于使用vb.net在标签之间提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆