使用 Java 和 UTF-8 编码生成有效的 XML [英] Producing valid XML with Java and UTF-8 encoding

查看：24 发布时间：2021/12/27 15:35:05 java xml encoding utf-8

本文介绍了使用 Java 和 UTF-8 编码生成有效的 XML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 JAXP 生成并解析一个 XML 文档，其中一些字段是从数据库加载的.

I am using JAXP to generate and parse an XML document from which some fields are loaded from a database.

序列化 XML 的代码:

Code to serialize the XML:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("test");
root.setAttribute("version", text);
doc.appendChild(root);

DOMSource domSource = new DOMSource(doc);
TransformerFactory tFactory = TransformerFactory.newInstance();

FileWriter out = new FileWriter("test.xml");
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(domSource, new StreamResult(out));

解析 XML 的代码:

Code to parse the XML:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("test.xml");

我遇到以下异常:

[Fatal Error] test.xml:1:4: Invalid byte 1 of 1-byte UTF-8 sequence.
Exception in thread "main" org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at com.test.Test.xml(Test.java:27)
    at com.test.Test.main(Test.java:55)

字符串文本包括 u-umlaut 和 o-umlaut(字符代码 0xFC 和 0xF6).这些是导致错误的字符.当我自己转义字符串以使用 ü和 ö那么问题就迎刃而解了.当我写出 XML 时，其他实体会自动编码.

The String text includes u-umlaut and o-umlaut (character codes 0xFC and 0xF6). These are the characters that are causing the error. When I escape the String myself to use ü and ö then the problem goes away. Other entities are automatically encoded when I write out the XML.

如何在不自己替换这些字符的情况下正确写入/读取我的输出?

How do I get my output to be written / read properly without substituting these characters myself?

(我已经阅读了以下问题:

(I've read the following questions already:

如何将字符从 Oracle 编码为 XML?

修复 XML 文件中的错误编码)

使用 Java 和 UTF-8 编码生成有效的 XML [英] Producing valid XML with Java and UTF-8 encoding

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用 Java 和 UTF-8 编码生成有效的 XML [英] Producing valid XML with Java and UTF-8 encoding

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭