如何在使用Java解析文档元素之前保留空格? [英] How to keep whitespace before document element when parsing with Java?

查看:171
本文介绍了如何在使用Java解析文档元素之前保留空格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用程序中,我修改了一些XML文件,其开头如下:

In my application, I alter some part of XML files, which begin like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: version control yadda-yadda $ -->

<myElement>
...

请注意< myElement> ; 。加载,更改和保存后,结果远远不够:

Note the blank line before <myElement>. After loading, altering and saving, the result is far from pleasing:

<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: version control yadda-yadda $ --><myElement>
...

我发现在评论和评论之间的空格(一个换行)文档节点根本不在DOM中表示。以下自包含代码可靠地重现问题:

I found out that the whitespace (one newline) between the comment and the document node is not represented in the DOM at all. The following self-contained code reproduces the issue reliably:

String source =
    "<?xml version=\"1.0\" encoding=\"UTF-16\"?>\n<!-- foo -->\n<empty/>";
byte[] sourceBytes = source.getBytes("UTF-16");

DocumentBuilder builder =
    DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc =
    builder.parse(new ByteInputStream(sourceBytes, sourceBytes.length));

DOMImplementationLS domImplementation =
    (DOMImplementationLS) doc.getImplementation();
LSSerializer lsSerializer = domImplementation.createLSSerializer();
System.out.println(lsSerializer.writeToString(doc));

// output: <?xml version="1.0" encoding="UTF-16"?>\n<!-- foo --><empty/>

有谁有一个想法如何避免这个?本质上,我希望输出与输入相同。 (我知道xml声明将被重新生成,因为它不是DOM的一部分,但这不是一个问题。)

Does anyone have an idea how to avoid this? Essentially, I want the output to be the same as the input. (I know that the xml declaration will be regenerated because it's not part of the DOM, but that's not an issue here.)

推荐答案

根本原因是标准 DOM Level 3 不能将文本节点表示为Document的子节点,而不会违反规范。空白将由任何兼容的解析器删除。

The root cause is that the standard DOM Level 3 cannot represent Text nodes as children of a Document without breaking the spec. Whitespace will be dropped by any compliant parser.

Document -- 
    Element (maximum of one),
    ProcessingInstruction,
    Comment,
    DocumentType (maximum of one)

如果你需要符合标准的解决方案,目标是可读性而不是100%的复制,我会在您的输出机制中寻找。

If you require a standards-compliant solution and the objective is readability rather than 100% reproduction, I would look for it in your output mechanism.

这篇关于如何在使用Java解析文档元素之前保留空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆