如何在读取文件时忽略空格以生成XML DOM [英] How to ignore whitespace while reading a file to produce an XML DOM

查看:178
本文介绍了如何在读取文件时忽略空格以生成XML DOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取一个文件来生成一个DOM文档,但该文件有空格和换行符,我试图忽略它们,但我不能:

I'm trying to read a file to produce a DOM Document, but the file has whitespace and newlines and I'm trying to ignore them, but I couldn't:

DocumentBuilderFactory docfactory=DocumentBuilderFactory.newInstance();
docfactory.setIgnoringElementContentWhitespace(true);

我在Javadoc中看到setIgnoringElementContentWhitespace方法仅在启用验证标志时才运行,但我没有文档的DTD或XML模式。

I see in Javadoc that setIgnoringElementContentWhitespace method operates only when the validating flag is enabled, but I haven't the DTD or XML Schema for the document.

我该怎么办?

更新

我不喜欢介绍mySelf的想法< !ELEMENT ...声明,我已经尝试了解决方案Tomalak指出的>论坛,但它不起作用,我在linux环境中使用了java 1.6。我想如果不再提议,我会做一些方法来忽略空格文本节点

I don't like the idea of introduce mySelf < !ELEMENT... declarations and i have tried the solution proposed in the forum pointed by Tomalak, but it doesn't work, i have used java 1.6 in an linux environment. I think if no more is proposed i will make a few methods to ignore whitespace text nodes

推荐答案

'IgnoringElementContentWhitespace'不是关于删除所有纯空白文本节点,只有空格在模式中描述为具有ELEMENT内容的空格节点 - 也就是说,它们只包含其他元素而不包含文本。

‘IgnoringElementContentWhitespace’ is not about removing all pure-whitespace text nodes, only whitespace nodes whose parents are described in the schema as having ELEMENT content — that is to say, they only contain other elements and never text.

如果您没有使用架构(DTD或XSD),则元素内容默认为MIXED,因此此参数将永远不会产生任何影响。 (除非解析器提供非标准DOM扩展来将所有未知元素视为包含ELEMENT内容,据我所知,Java可用的内容不会。)

If you don't have a schema (DTD or XSD) in use, element content defaults to MIXED, so this parameter will never have any effect. (Unless the parser provides a non-standard DOM extension to treat all unknown elements as containing ELEMENT content, which as far as I know the ones available for Java do not.)

您可以在进入解析器的途中破解文档以包含架构信息,例如通过向<中添加内部子集。 !DOCTYPE ... [...]>声明包含< !ELEMENT ...>声明,然后使用IgnoringElementContentWhitespace参数。

You could hack the document on the way into the parser to include the schema information, for example by adding an internal subset to the < !DOCTYPE ... [...] > declaration containing < !ELEMENT ... > declarations, then use the IgnoringElementContentWhitespace parameter.

或者,可能更容易,您可以在后处理中删除空白节点,或者当他们使用LSParserFilter时。

Or, possibly easier, you could just strip out the whitespace nodes, either in a post-process, or as they come in using an LSParserFilter.

这篇关于如何在读取文件时忽略空格以生成XML DOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆