Java中的字节和字符转换 [英] Byte and char conversion in Java

查看:123
本文介绍了Java中的字节和字符转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我将字符转换为字节,然后返回到 char ,那个角色神秘地消失,成为别的东西。这是可能的吗?



这是代码:

  char a ='È' // line 1 
byte b =(byte)a; // line 2
char c =(char)b; // line 3
System.out.println((char)c ++(int)c);

直到第2行一切正常:




  • 在第1行中,我可以在控制台中打印a,并显示È。


  • 第2行我可以在控制台中打印b,它会显示-56,因为字节被签名是200。而200是È。所以还是很好。




但是第3行出了什么问题? c成为别的东西,程序打印? 65480 。这是一个完全不同的东西。



为了获得正确的结果,我应该在第3行写什么?

解决方案

Java中的一个字符是一个Unicode代码单元,被视为无符号数字。所以如果你执行 c =(char)b 你得到的值是2 ^ 16 - 56或65536 - 56。



或者更确切地说,在扩展转换中,首先使用符号扩展名将字节转换为值为$ code> 0xFFFFFFC8 的有符号整数。然后,当转换为 char 时,这反过来变窄为 0xFFC8 ,这转换为正数 65480



从语言规范:



5.1.4。扩大和缩小原始转换


首先,通过扩展原语转换将字节转换为int(§5.1.2) ,
,然后通过缩小原始转换
(§5.1.3)将结果int转换为char。


要获得正确的点,请使用 char c =(char)(b& 0xFF)通过使用掩码将 b 的值转换为正整数 200 ,将转换后的前24位置零: 0xFFFFFFC8 成为 0x000000C8 或小数位数 200






以上是直接说明在字节之间转换过程中会发生什么, int char 原始类型。



如果你想从字节编码/解码字符,使用 Charset CharsetEncoder CharsetDecoder 或一个方便的方法,如 new String(byte [] bytes,Charset charset) String#toBytes(Charset charset)。您可以从 StandardCharsets 获取字符集(如UTF-8或Windows-1252)。


If I convert a character to byte and then back to char, that character mysteriously disappears and becomes something else. How is this possible?

This is the code:

char a = 'È';       // line 1       
byte b = (byte)a;   // line 2       
char c = (char)b;   // line 3
System.out.println((char)c + " " + (int)c);

Until line 2 everything is fine:

  • In line 1 I could print "a" in the console and it would show "È".

  • In line 2 I could print "b" in the console and it would show -56, that is 200 because byte is signed. And 200 is "È". So it's still fine.

But what's wrong in line 3? "c" becomes something else and the program prints ? 65480. That's something completely different.

What I should write in line 3 in order to get the correct result?

解决方案

A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b the value you get is 2^16 - 56 or 65536 - 56.

Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is then narrowed down to 0xFFC8 when casting to a char, which translates to the positive number 65480.

From the language specification:

5.1.4. Widening and Narrowing Primitive Conversion

First, the byte is converted to an int via widening primitive conversion (§5.1.2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1.3).


To get the right point use char c = (char) (b & 0xFF) which first converts the byte value of b to the positive integer 200 by using a mask, zeroing the top 24 bits after conversion: 0xFFFFFFC8 becomes 0x000000C8 or the positive number 200 in decimals.


Above is a direct explanation of what happens during conversion between the byte, int and char primitive types.

If you want to encode/decode characters from bytes, use Charset, CharsetEncoder, CharsetDecoder or one of the convenience methods such as new String(byte[] bytes, Charset charset) or String#toBytes(Charset charset). You can get the character set (such as UTF-8 or Windows-1252) from StandardCharsets.

这篇关于Java中的字节和字符转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆