未知字节由方法getBytes()返回 [英] unknown bytes is returned by method getBytes()

查看:46
本文介绍了未知字节由方法getBytes()返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 导入java.io.UnsupportedEncodingException;导入java.util.Arrays;公共班级主要{公共静态void main(String [] args){尝试{字符串s ="s";System.out.println(Arrays.toString(s.getBytes("utf8"))));System.out.println(Arrays.toString(s.getBytes("utf16")));System.out.println(Arrays.toString(s.getBytes("utf32")));}捕获(UnsupportedEncodingException e){e.printStackTrace();}}} 

控制台:

 [115][-2,-1、0、115][0,0,0,115] 

是什么?

[-2,-1]-???

我还指出,如果我这样做:

 String s = new String(new char [] {'\ u1251'});System.out.println(Arrays.toString(s.getBytes("utf8"))));System.out.println(Arrays.toString(s.getBytes("utf16")));System.out.println(Arrays.toString(s.getBytes("utf32"))); 

控制台:

 [-31,-119,-111][-2,-1、18、81][0,0,18,81] 

解决方案

-2,-1是字节顺序标记(BOM-U + FEFF),它指示以下文本以UTF-16格式编码./p>

您可能会得到这个信息是因为,虽然只有一种UTF8和UTF32编码,但是有两种UTF16编码UTF16LE和UTF16BE,其中16位值中的2个字节以Big-Endian或Little Endian格式存储.

由于返回的值为0xFE xFF,这表明编码为UTF16BE



import java.io.UnsupportedEncodingException;
import java.util.Arrays;

public class Main {
 public static void main(String[] args)
 {
  try 
  {
   String s = "s";
   System.out.println( Arrays.toString( s.getBytes("utf8") ) );
   System.out.println( Arrays.toString( s.getBytes("utf16") ) );
   System.out.println( Arrays.toString( s.getBytes("utf32") ) );
  }  
  catch (UnsupportedEncodingException e) 
  {
   e.printStackTrace();
  }
 }
}

Console:


[115]
[-2, -1, 0, 115]
[0, 0, 0, 115]

What is it?

[-2, -1] - ???

Also, i noted, that if i do that:


String s = new String(new char[]{'\u1251'});
System.out.println( Arrays.toString( s.getBytes("utf8") ) );
System.out.println( Arrays.toString( s.getBytes("utf16") ) );
System.out.println( Arrays.toString( s.getBytes("utf32") ) );

Console:


[-31, -119, -111]
[-2, -1, 18, 81]
[0, 0, 18, 81]

解决方案

The -2, -1 is a Byte Order Mark (BOM - U+FEFF) that indcates that the following text is encoded in UTF-16 format.

You are probably getting this because, while there is only one UTF8 and UTF32 encoding, there are two UTF16 encodings UTF16LE and UTF16BE, where the 2 bytes in the 16-bit value are stored in Big-Endian or Little Endian format.

As the values that come back are 0xFE xFF, this suggests that the encoding is UTF16BE

这篇关于未知字节由方法getBytes()返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆