transformer.setOutputProperty(OutputKeys.ENCODING," UTF-8")无效 [英] transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8") is NOT working

查看:1642
本文介绍了transformer.setOutputProperty(OutputKeys.ENCODING," UTF-8")无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下方法将XMLDom写入流:

I have the following method to write an XMLDom to a stream:

public void writeToOutputStream(Document fDoc, OutputStream out) throws Exception {
    fDoc.setXmlStandalone(true);
    DOMSource docSource = new DOMSource(fDoc);
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.METHOD, "xml");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.INDENT, "no");
    transformer.transform(docSource, new StreamResult(out));
}

我正在测试其他一些XML功能,这只是我的方法用来写一个文件。我的测试程序生成了33个测试用例,其中写出了文件。其中28个具有以下标题:

I am testing some other XML functionality, and this is just the method that I use to write to a file. My test program generates 33 test cases where files are written out. 28 of them have the following header:

<?xml version="1.0" encoding="UTF-8"?>...

但出于某种原因,现在有1个测试用例产生:

But for some reason, 1 of the test cases now produce:

<?xml version="1.0" encoding="ISO-8859-1"?>...

还有四种产品:

<?xml version="1.0" encoding="Windows-1252"?>...

您可以清楚地看到,我将ENCODING输出键设置为UTF-8。这些测试曾用于早期版本的Java。我没有在一段时间内(超过一年)运行测试,但今天在Java(TM)SE运行时环境(版本1.6.0_22-b04)上运行我得到了这个有趣的行为。

As you can clearly see, I am setting ENCODING output key to UTF-8. These tests used to work on an earlier version of Java. I have not run the tests in a while (more than a year) but running today on "Java(TM) SE Runtime Environment (build 1.6.0_22-b04)" I get this funny behavior.

我已经验证导致问题的文档是从最初具有这些编码的文件中读取的。似乎新版本的库试图保留已读取的源文件的编码。但这不是我想要的......我真的希望输出是UTF-8。

I have verified that the documents causing the problem were read from files that originally had those encoding. It seems that the new versions of the libraries are attempting to preserve the encoding of the source file that was read. But that is not what I want ... I really do want the output to be in UTF-8.

有没有人知道可能导致变压器忽略UTF-8编码设置的任何其他因素?还有什么必须在文档上设置,以忘记最初读取的文件的编码?

Does anyone know of any other factor that might cause the transformer to ignore the UTF-8 encoding setting? Is there anything else that has to be set on the document to say to forget the encoding of the file that was originally read?

更新:

我在另一台机器上检出了同一个项目,并在那里建立并运行了测试。在那台机器上,所有测试都通过了!所有文件的标题中都有UTF-8。该机器具有Java(TM)SE运行时环境(版本1.6.0_29-b11)两台机器都运行Windows 7.在新机器上运行正常,jdk1.5.0_11用于进行构建,但是在旧机器上运行机器jdk1.6.0_26用于构建。用于两个版本的库完全相同。可以在构建时使用1.5与JDK 1.6不兼容吗?

I checked out the same project out on another machine, built and ran the tests there. On that machine all the tests pass! All the files have "UTF-8" in their header. That machine has "Java(TM) SE Runtime Environment (build 1.6.0_29-b11)" Both machines are running Windows 7. On the new machine that works correctly, jdk1.5.0_11 is used to make the build, but on the old machine jdk1.6.0_26 is used to make the build. The libraries used for both builds are exactly the same. Can it be a JDK 1.6 incompatibility with 1.5 at build time?

更新:

4.5年后, Java库仍然存在,但由于Vyrx的建议,我终于有了一个合适的解决方案!

After 4.5 years, the Java library is still broken, but due to the suggestion by Vyrx below, I finally have a proper solution!

public void writeToOutputStream(Document fDoc, OutputStream out) throws Exception {
    fDoc.setXmlStandalone(true);
    DOMSource docSource = new DOMSource(fDoc);
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.METHOD, "xml");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "no");
    out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>".getBytes("UTF-8"));
    transformer.transform(docSource, new StreamResult(out));
}

解决方案是禁用标题的写入,并写出正确的在将XML序列化为输出流之前的标题。 Lame,但它会产生正确的结果。 4年前破坏的测试现在再次运行!

The solution is to disable the writing of the header, and to write the correct header just before serializing the XML to the output steam. Lame, but it produces the correct results. Tests broken over 4 years ago are now running again!

推荐答案

在序列化表情符号时,我在Android上遇到了同样的问题。在变换器中使用UTF-8编码时,输出是HTML字符实体(UTF-16代理对),这将随后破坏读取数据的其他解析器。

I had the same problem on Android when serializing emoji characters. When using UTF-8 encoding in the transformer the output was HTML character entities (UTF-16 surrogate pairs), which would subsequently break other parsers that read the data.

这就是我最终解决它的方法:

This is how I ended up solving it:

StringWriter sw = new StringWriter();
sw.write("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>");
Transformer t = TransformerFactory.newInstance().newTransformer();

// this will work because we are creating a Java string, not writing to an output
t.setOutputProperty(OutputKeys.ENCODING, "UTF-16"); 
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.transform(new DOMSource(elementNode), new StreamResult(sw));

return IOUtils.toInputStream(sw.toString(), Charset.forName("UTF-8"));

这篇关于transformer.setOutputProperty(OutputKeys.ENCODING,&quot; UTF-8&quot;)无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆