库来生成HTML标记汤.NET的XmlDocument [英] Library to generate .NET XmlDocument from HTML tag soup

查看:193
本文介绍了库来生成HTML标记汤.NET的XmlDocument的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在找一个.NET库,可以生成一个干净的XML树,最好System.Xml.XmlDocument,从无效的HTML code。 I.E.它应该做的那种尽力猜测,修理,而当这种情况面临换人的浏览器做的,并产生pretend的XmlDocument。库也应维护良好。 :)

I'm looking for a .NET library that can generate a clean Xml tree, ideally System.Xml.XmlDocument, from invalid HTML code. I.E. it should make the kind of best effort guesses, repairs, and substitutions browsers do when confronted with this situation, and generate a pretend XmlDocument. The library should also be well-maintained. :)

我意识到这是很多(太多?)要问了,我会AP preciate任何有用的线索。似乎有这样的Java实现的一个公平的数字,但我宁愿没有产生自己的绑定。到目前为止,对于.NET,我发现 http://www.majestic12.co.uk/projects /html_parser.php http://users.rcn.com/creitzel/ tidy.html#DOTNET http://sourceforge.net/projects/tidyfornet

I realize this is a lot (too much?) to ask, and I would appreciate any useful leads. There seem to be a fair number of implementations of this for Java, but I would rather not generate my own bindings. So far for .NET I have found http://www.majestic12.co.uk/projects/html_parser.php and http://users.rcn.com/creitzel/tidy.html#dotnet, and http://sourceforge.net/projects/tidyfornet .

我还没有建成或测试任何这些,但是从(疏)文档和罕见的更新,他们似乎并不像他们有我在寻找什么。所以,做什么建议,你有,无论是这些选择中,或从过去的经验。

I have not yet built or tested any of these, but from the (sparse) docs and rare updates they do not seem like they have what I'm looking for. So what recommendations do you have, either among these choices, or from your past experience.

推荐答案

HTML敏捷性包的高度评价。它一定会做解析/最好的猜测等。

The HTML Agility Pack is highly rated. It will certainly do the parsing / best guess etc.

该模型intentially类似的XmlDocument,其中的SelectNodes等进行查询。

The model is intentially similar to XmlDocument, including SelectNodes etc for querying.

如果您需要XHTML输出,有一个 OptionOutputAsXml 标志;我认为这个设置为true,并在XHTML调用保存的结果。

If you need xhtml output, there is a OptionOutputAsXml flag; I assume that setting this to true and calling Save results in xhtml.

这篇关于库来生成HTML标记汤.NET的XmlDocument的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆