Unmarshall期间无效的XML字符 [英] Invalid XML Character During Unmarshall

查看：240 发布时间：2019/6/14 19:51:39 java xml-serialization jaxb unmarshalling

本文介绍了Unmarshall期间无效的XML字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用编码UTF-8将对象编组到XML文件。它成功生成文件。但是当我尝试解组时，会出现错误：

I am marshalling objects to XML file using encoding "UTF-8". It generates file successfully. But when I try to unmarshal it back, there is an error:

无效的XML字符（Unicode：
0x {2 }）在
属性{1}的值中找到，元素为0

An invalid XML character (Unicode: 0x{2}) was found in the value of attribute "{1}" and element is "0"

字符是0x1A或\\\，在UTF-8中有效，但在XML中是非法的。 JAXB中的Marshaller允许将此字符写入XML文件，但Unmarshaller无法解析它。我尝试使用其他编码（UTF-16，ASCII等），但仍然是错误。

The character is 0x1A or \u001a, which is valid in UTF-8 but illegal in XML. Marshaller in JAXB allows writing this character into XML file, but Unmarshaller cannot parse it back. I tried to use another encoding (UTF-16, ASCII, etc) but still error.

常见的解决方案是在XML解析之前删除/替换此无效字符。但是如果我们需要这个角色，如何在解组后获得原始角色？

The common solution is to remove/replace this invalid character before XML parsing. But if we need this character back, how to get the original character after unmarshalling?

在寻找这个解决方案时，我想要在解组之前用替换字符替换无效字符（例如dot =。）。

While looking for this solution, I want to replace the invalid characters with a substitute character (for example dot = ".") before unmarshalling.

我创建了这个类：

public class InvalidXMLCharacterFilterReader extends FilterReader {

    public static final char substitute = '.'; 

    public InvalidXMLCharacterFilterReader(Reader in) {
        super(in);
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {

        int read = super.read(cbuf, off, len);

        if (read == -1)
            return -1;

        for (int readPos = off; readPos < off + read; readPos++) {
            if(!isValid(cbuf[readPos])) {
                   cbuf[readPos] = substitute;
            }
        }

        return readPos - off + 1; 
    }

    public boolean isValid(char c) {
        if((c == 0x9)
                || (c == 0xA) 
                || (c == 0xD) 
                || ((c >= 0x20) && (c <= 0xD7FF)) 
                || ((c >= 0xE000) && (c <= 0xFFFD)) 
                || ((c >= 0x10000) && (c <= 0x10FFFF)))
        {
            return true;
        } else
            return false;
    }
 }

这就是我读取和解组文件的方式：

Then this is how I read and unmarshall the file:

FileReader fileReader = new FileReader(this.getFile());
Reader reader = new InvalidXMLCharacterFilterReader(fileReader);
Object o = (Object)um.unmarshal(reader);

不知何故，读者不会用我想要的字符替换无效字符。它导致错误的XML数据无法解组。我的InvalidXMLCharacterFilterReader类有问题吗？

Somehow the reader does not replace invalid characters with the character I want. It results a wrong XML data which can't be unmarshalled. Is there something wrong with my InvalidXMLCharacterFilterReader class?

Unmarshall期间无效的XML字符 [英] Invalid XML Character During Unmarshall

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Unmarshall期间无效的XML字符 [英] Invalid XML Character During Unmarshall

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭