使用XDocument使用DTD解析XHTML [英] Parsing XHTML with DTD using XDocument

查看:84
本文介绍了使用XDocument使用DTD解析XHTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从XHTML文档中获取纯文本。

I need to get plain text from XHTML documents.

我确定我已经在这里读到了WP7上的XDocument不支持DTD。我找不到。好吧,当我尝试使用XDocument使用DTD解析XHTML时,它会抛出NotSuportedException。 stacktrace中的最后一次调用位于 System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()

I am sure I already read somewhere here, that XDocument on WP7 does not support DTD. I cannot find it though. Well, when I try to parse XHTML with DTD using XDocument, it throws NotSuportedException. Last call in stacktrace is at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl().

即使我尝试使用一些虚拟的XmlResolver-并没有真正被调用。 (以下此问题中的回答)。

That is exactly same even if I try to use some dummy XmlResolver - it doesn't really get called. (following answer in this question).

所以我认为WP7 确实不支持它。

So I assume that WP7 really doesn't support it.

,我需要来解析XHTML文档。到目前为止,我想出了两个(或多或少真实的)解决方案:

如果删除该DTD声明,则可以这样做。但是,XHTML中可能有一些 character实体,如果该 character实体不是预定义的XML实体之一,则会引发异常。

因此该解决方案仅适用于某些XHTML。

Well, I need to parse XHTML docs. So far I came up with two (more or less real) solutions:
I can do that if I remove that DTD declaration. But, there can be some character entity in the XHTML, and then exception is thrown if that character entity is not one of the predefined XML entity.
So that solution works only for some XHTMLs.

我想到了使用Regex。删除所有html标签非常容易,但是仍然存在实体问题,因为我认为替换所有实体不是真正的/好的解决方案。

I thought of using Regex. It is quite easy to remove all the html tags, but the 'entity problem' remains as I don't think it is real/good solution to do replace for all entities.

有人面对/解决了这个问题吗?如果我在某些方面有误,可以给我一些建议或纠正我吗?
谢谢。

Anyone faced/solved this? Can you give me some advice or correct me if I am wrong on something? Thanks.

推荐答案

HTML Agility pack是一个用于解析html文档的库,正如论坛所声称的那样,它具有WP7的版本

HTML Agility pack is a library for parsing html document, as claimed on the forum, it has a version for WP7

http:// htmlagilitypack。 codeplex.com/discussions/225113

这篇关于使用XDocument使用DTD解析XHTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆