未知字节由方法getBytes()返回 [英] unknown bytes is returned by method getBytes()
问题描述
导入java.io.UnsupportedEncodingException;导入java.util.Arrays;公共班级主要{公共静态void main(String [] args){尝试{字符串s ="s";System.out.println(Arrays.toString(s.getBytes("utf8"))));System.out.println(Arrays.toString(s.getBytes("utf16")));System.out.println(Arrays.toString(s.getBytes("utf32")));}捕获(UnsupportedEncodingException e){e.printStackTrace();}}}
控制台:
[115][-2,-1、0、115][0,0,0,115]
是什么?
[-2,-1]-???
我还指出,如果我这样做:
String s = new String(new char [] {'\ u1251'});System.out.println(Arrays.toString(s.getBytes("utf8"))));System.out.println(Arrays.toString(s.getBytes("utf16")));System.out.println(Arrays.toString(s.getBytes("utf32")));
控制台:
[-31,-119,-111][-2,-1、18、81][0,0,18,81]
-2,-1是字节顺序标记(BOM-U + FEFF),它指示以下文本以UTF-16格式编码./p>
您可能会得到这个信息是因为,虽然只有一种UTF8和UTF32编码,但是有两种UTF16编码UTF16LE和UTF16BE,其中16位值中的2个字节以Big-Endian或Little Endian格式存储.
由于返回的值为0xFE xFF,这表明编码为UTF16BE
import java.io.UnsupportedEncodingException;
import java.util.Arrays;
public class Main {
public static void main(String[] args)
{
try
{
String s = "s";
System.out.println( Arrays.toString( s.getBytes("utf8") ) );
System.out.println( Arrays.toString( s.getBytes("utf16") ) );
System.out.println( Arrays.toString( s.getBytes("utf32") ) );
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
}
}
Console:
[115]
[-2, -1, 0, 115]
[0, 0, 0, 115]
What is it?
[-2, -1] - ???
Also, i noted, that if i do that:
String s = new String(new char[]{'\u1251'});
System.out.println( Arrays.toString( s.getBytes("utf8") ) );
System.out.println( Arrays.toString( s.getBytes("utf16") ) );
System.out.println( Arrays.toString( s.getBytes("utf32") ) );
Console:
[-31, -119, -111]
[-2, -1, 18, 81]
[0, 0, 18, 81]
The -2, -1 is a Byte Order Mark (BOM - U+FEFF) that indcates that the following text is encoded in UTF-16 format.
You are probably getting this because, while there is only one UTF8 and UTF32 encoding, there are two UTF16 encodings UTF16LE and UTF16BE, where the 2 bytes in the 16-bit value are stored in Big-Endian or Little Endian format.
As the values that come back are 0xFE xFF, this suggests that the encoding is UTF16BE
这篇关于未知字节由方法getBytes()返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!