使用vb.net在标签之间提取文本 [英] Extrach text between tags using vb.net
问题描述
如何在以下标记之间提取MyText
< span class ="word">
< a href =#0"> MyText</a>
</span>
我要提取MyText. MyText有多次出现.
这里href的值也会改变.像href =#1",href =#2",href =#40"
在此先感谢
How to extract MyText between the below tags
<span class="word">
<a href="#0">MyText</a>
</span>
I want to extract MyText. MyText has multiple occurrences.
Here value of href also changes. like href="#1" , href="#2" ,href="#40"
Thanks in advance
推荐答案
第一个也是最好的选择,如果您可以依靠HTML代码是格式良好的XML的事实,那么.NET可以很好地支持它.选择一种不同的方式:
First and best option is if you can rely upon the fact that an HTML code is well-formed XML which is well supported by .NET. Choose one different ways:
- 使用
System.Xml.XmlDocument
类.它实现了DOM接口;如果文档太大,则这种方法最简单,也足够好.
请参见- 使用类
].System.Xml.XmlTextReader
; library/system.xml.xmldocument.aspx"target =" _ blank"title =" New Window> ^ - 使用类
- 使用类
System.Xml.XmlTextReader
;这是最快的读取方法,尤其是您需要跳过一些数据.
请参见 http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx [ http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx [http://msdn.microsoft.com/en-us/library/bb387063.aspx [
- Use
System.Xml.XmlDocument
class. It implements DOM interface; this way is the easiest and good enough if the size if the document is not too big.
See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^]. - Use the class
System.Xml.XmlTextReader
; this is the fastest way of reading, especially is you need to skip some data.
See http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx[^]. - Use the class
System.Xml.Linq.XDocument
; this is the most adequate way similar to that ofXmlDocument
, supporting LINQ to XML Programming.
See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^], http://msdn.microsoft.com/en-us/library/bb387063.aspx[^].
如果不是这种情况,很不幸很有可能,您将需要一些HTML解析器来处理格式不正确的XML代码.试试这个: http://www.majestic12.co.uk/projects/html_parser.php [ ^ ].
If this is not the case, which is unfortunately very likely, you will need some HTML parser which can deal with the code which is not well-formed as XML. Try this one: http://www.majestic12.co.uk/projects/html_parser.php[^].
这篇关于使用vb.net在标签之间提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!