Java 8 UTF-8编码问题（java bug？） [英] Java 8 UTF-8 encoding issue (java bug?)

查看：415 发布时间：2017/8/16 19:31:56 java encoding utf-8 java-8

本文介绍了Java 8 UTF-8编码问题（java bug？）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

运行以下代码：

  public static void encodingIssue（）throws IOException {
 byte [] array = new byte [3]; 
 array [0] =（byte）-19; 
 array [1] =（byte）-69; 
 array [2] =（byte）-100; 
 
 String str = new String（array，UTF-8）; （char c：str.toCharArray（））
 {
 System.out.println（（int）c）; 
} 
}

在Java 1.8.0_20（及更早版本）中，结果

您是否遇到此错误？是否有解决方法？

这种不一致性也适用于Shift_JIS，JIS_X0212-1990，x-IBM300，x-IBM834，x-IBM942，x-IBM942C， x-JIS0208，但显然UTF-8更为迫切。

解决方案

它是\"修改的UTF-8 编码来存储替代对（或甚至不配对的该范围的字符），如单个字符。如果解码器声称使用标准的 UTF-8 使用修改的UTF-8，则这是一个错误。这似乎已经用Java 8修复。

您可以使用指定的方法可靠地读取此类数据，以使用修改的UTF-8

  ByteBuffer bb = ByteBuffer.allocate（array.length + 2）; 
 bb.putShort（（short）array.length）.put（array）; 
 ByteArrayInputStream bis = new ByteArrayInputStream（bb.array（））; 
 DataInputStream dis = new DataInputStream（bis）; 
 String str = dis.readUTF（）;

There is an inconsistency when creating a String with UTF-8 encoding.

Run this code:

public static void encodingIssue() throws IOException {
    byte[] array = new byte[3];
    array[0] = (byte) -19;
    array[1] = (byte) -69;
    array[2] = (byte) -100;

    String str = new String(array, "UTF-8");
    for (char c : str.toCharArray()) {
        System.out.println((int) c);
    }
}

On Java 1.8.0_20 (and earlier versions) we have the result

On Java 1.7 and 1.6 we have the correct result:

Have you encountered this error? Is there a workaround for this?

This inconsistency manifests itself also for Shift_JIS, JIS_X0212-1990, x-IBM300, x-IBM834, x-IBM942, x-IBM942C, x-JIS0208, but obviously UTF-8 is the more urgent.

解决方案

It is a property of the "Modified UTF-8" encoding to store surrogate pairs (or even unpaired chars of that range) like individual characters. And it’s an error if a decoder claiming to use standard UTF-8 uses "Modified UTF-8". This seems to have been fixed with Java 8.

You can reliably read such data using a method that is specified to use "Modified UTF-8":

ByteBuffer bb=ByteBuffer.allocate(array.length+2);
bb.putShort((short)array.length).put(array);
ByteArrayInputStream bis=new ByteArrayInputStream(bb.array());
DataInputStream dis=new DataInputStream(bis);
String str=dis.readUTF();

这篇关于Java 8 UTF-8编码问题（java bug？）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java 8 UTF-8编码问题（java bug？） [英] Java 8 UTF-8 encoding issue (java bug?)

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java 8 UTF-8编码问题（java bug？） [英] Java 8 UTF-8 encoding issue (java bug?)

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭