Java jaxb utf-8 / iso转换 [英] Java jaxb utf-8/iso convertions
问题描述
我有一个包含非标准字符的XML文件(比如奇怪的引用)。
I have a XML file that contains non-standard characters (like a weird "quote").
我使用UTF-8 / ISO / ascii读取XML + unmarshalled it:
I read the XML using UTF-8 / ISO / ascii + unmarshalled it:
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream()),"ISO-8859-1"));
String output;
StringBuffer sb = new StringBuffer();
while ((output = br.readLine()) != null) {
//fetch XML
sb.append(output);
}
try {
jc = JAXBContext.newInstance(ServiceResponse.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
ServiceResponse OWrsp = (ServiceResponse) unmarshaller
.unmarshal(new InputSource(new StringReader(sb.toString())));
我有一个oracle函数,它将采用iso-8859-1代码,并将它们转换/映射到文字符号。即:&#x2019=>左单引号
I have a oracle function that will take iso-8859-1 codes, and converts/maps them to "literal" symbols. i.e: "’" => "left single quote"
使用iso的JAXB unmarshal,显示带有iso转换的字符。即所有奇怪的单引号都将被编码为&#x2019
JAXB unmarshal using iso, displays the characters with iso conversion fine. i.e all weird single quotes will be encoded to "’"
所以假设我的字符串是:10-11岁的班级(注意奇怪 - 在11和年之间)
so suppose my string is: class of 10–11‐year‐olds (note the weird - between 11 and year)
jc = JAXBContext.newInstance(ScienceProductBuilderInfoType.class);
Marshaller m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_ENCODING, "ISO-8859-1");
//save a temp file
File file2 = new File("tmp.xml");
这将保存在文件中:
class of 10–11‐year‐olds. (what i want..so file saving works!)
[旁注:我已阅读文件使用java文件阅读器,它输出上面的字符串很好]
[side note: i have read the file using java file reader, and it out puts the above string fine]
我的问题是使用jaxb unmarshaller的STRING表示有奇怪的输出,由于某种原因我不能似乎得到字符串来表示 - 。
the issue i have is that the STRING representation using jaxb unmarshaller has weird output, for some reason i cannot seem to get the string to represent –.
当我
1:检查xml unmarshalled输出:
when I 1: check the xml unmarshalled output:
class of 10?11?year?olds
2:文件输出:
class of 10–11‐year‐olds
我甚至试图从保存的XML中读取文件,然后解组(希望得到 - 在我的字符串中)
i even tried to read the file from the saved XML, and then unmarshal that (in hopes of getting the – in my string)
String sCurrentLine;
BufferedReader br = new BufferedReader(new FileReader("tmp.xml"));
StringBuffer sb = new StringBuffer();
while ((sCurrentLine = br.readLine()) != null) {
sb.append(sCurrentLine);
}
ScienceProductBuilderInfoType rsp = (ScienceProductBuilderInfoType) unm
.unmarshal(new InputSource(new StringReader(sb.toString())));
无效。
任何想法如何在jaxb中获取iso-8859-1编码字符?
any ideas how to get the iso-8859-1 encoded character in jaxb?
推荐答案
解决:使用stackoverflow上找到的这个tibid代码
Solved: using this tibid code found on stackoverflow
final class HtmlEncoder {
private HtmlEncoder() {}
public static <T extends Appendable> T escapeNonLatin(CharSequence sequence,
T out) throws java.io.IOException {
for (int i = 0; i < sequence.length(); i++) {
char ch = sequence.charAt(i);
if (Character.UnicodeBlock.of(ch) == Character.UnicodeBlock.BASIC_LATIN) {
out.append(ch);
} else {
int codepoint = Character.codePointAt(sequence, i);
// handle supplementary range chars
i += Character.charCount(codepoint) - 1;
// emit entity
out.append("&#x");
out.append(Integer.toHexString(codepoint));
out.append(";");
}
}
return out;
}
}
HtmlEncoder.escapeNonLatin(MYSTRING)
HtmlEncoder.escapeNonLatin(MYSTRING)
这篇关于Java jaxb utf-8 / iso转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!