为什么'(int)(char)(byte)-2'在Java中产生65534? [英] Why does '(int)(char)(byte)-2' produce 65534 in Java?

查看:140
本文介绍了为什么'(int)(char)(byte)-2'在Java中产生65534?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在工作的技术测试中遇到这个问题。给定以下代码示例:

I encountered this question in technical test for a job. Given the following code example:

public class Manager {
    public static void main (String args[]) {
        System.out.println((int) (char) (byte) -2);
    }
}

输出为65534。

此行为仅显示负值; 0和正数产生相同的值,意味着在SOP中输入的值。这里的字节无关紧要。我已经试过没有它。

This behavior shows for negative values only; 0 and positive numbers yield the same value, meaning the one entered in SOP. The byte cast here is insignificant; I have tried without it.

所以我的问题是:这是怎么回事。

So my question is: what exactly is going on here?

推荐答案

我们需要达成一些先决条件,才能了解这里发生的情况。通过理解以下的要点,其余的是简单的推导:

There are some prerequisites that we need to agree upon before you can understand what is happening here. With understanding the following bullet points, the rest is simple deduction:


  1. JVM中的所有原始类型都表示为一系列位。 int 类型由32位表示, char short 类型由16位表示,字节类型由8位表示。

  1. All primitive types within the JVM are represented as a sequence of bits. The int type is represented by 32 bits, the char and short types by 16 bits and the byte type is represented by 8 bits.

是有符号的,其中 char 类型是唯一未签名的数字。当数字被签名时,最高位用于表示此数字的符号。对于该最高位, 0 表示非负数(正或零),并且 1 表示负数。此外,对于有符号数字,负值是 反转 到正数的递增顺序数字。例如,一个正字节值用位表示如下:

All JVM numbers are signed, where the char type is the only unsigned "number". When a number is signed, the highest bit is used to represent the sign of this number. For this highest bit, 0 represents a non-negative number (positive or zero) and 1 represents a negative number. Also, with signed numbers, a negative value is inverted to the incrementation order of positive numbers. For example, a positive byte value is represented in bits as follows:

00 00 00 00 => (byte) 0
00 00 00 01 => (byte) 1
00 00 00 10 => (byte) 2
...
01 11 11 11 => (byte) Byte.MAX_VALUE

,而负数的位顺序颠倒:

while the bit order for negative numbers is inverted:

11 11 11 11 => (byte) -1
11 11 11 10 => (byte) -2
11 11 11 01 => (byte) -3
...
10 00 00 00 => (byte) Byte.MIN_VALUE

这种倒置符号也解释了为什么负范围可以容纳一个额外的数字到正范围,其中后者包括数字 0 的表示。记住,这只是解释位模式的问题。你可以注意不同的负数,但这个负数的倒数表示法是非常方便,因为它允许一些相当快的转换,我们将能够看到在稍后的一个小例子。

This inverted notation also explains why the negative range can host an additional number compared to the positive range where the latter includes the representation of the number 0. Remember, all this is only a matter of interpreting a bit pattern. You can note negative numbers differently, but this inverted notation for negative numbers is quite handy because it allows for some rather fast transformations as we will be able to see in a small example later on.

如前所述,这不适用于 char 类型。 char 类型表示具有 0 65535 。这个数字中的每一个都指的是 16位Unicode 值。

As mentioned, this does not apply for the char type. The char type represents a Unicode character with a non-negative "numeric range" of 0 to 65535. Each of this number refers to a 16-bits Unicode value.

int byte code>, char boolean 键入JVM需要添加或截断位。

When converting between the int, byte, short, char and boolean types the JVM needs to either add or truncate bits.

如果目标类型由比转换类型更多的位表示,那么JVM只是使用最高位的值填充附加槽给定值(代表签名):

If the target type is represented by more bits than the type from which it is converted, then the JVM simply fills the additional slots with the value of the highest bit of the given value (which represents the signature):

|     short   |     byte    |
|             | 00 00 00 01 | => (byte) 1
| 00 00 00 00 | 00 00 00 01 | => (short) 1

由于倒排符号,此策略也适用于负数:

Thanks to the inverted notation, this strategy also works for negative numbers:

|     short   |     byte    |
|             | 11 11 11 11 | => (byte) -1
| 11 11 11 11 | 11 11 11 11 | => (short) -1

这样,值的符号被保留。请注意,此模型不会详细介绍为JVM实现此操作的情况,因此可以通过廉价的移位操作执行投射。显然是有利的。

This way, the value's sign is retained. Without going into details of implementing this for a JVM, note that this model allows for a casting being performed by a cheap shift operation what is obviously advantageous.

此规则的例外是加宽 a char 这是,如我们之前所说,未签名。通过使用 0 填充其他位,始终会应用转化 char 说没有标志,因此不需要倒置符号。因此,执行 char int 的转换:

An exception from this rule is widening a char type which is, as we said before, unsigned. A conversion from a char is always applied by filling the additional bits with 0 because we said there is no sign and thus no need for an inverted notation. A conversion of a char to an int is therefore performed as:

|            int            |    char     |     byte    |
|                           | 11 11 11 11 | 11 11 11 11 | => (char) \uFFFF
| 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 11 | => (int) 65535

当原始类型具有比目标类型更多的位时,隔断。只要原始值适合目标值,这就可以正常工作,例如对于 short 字节的以下转换

When the original type has more bits than the target type, the additional bits are merely cut off. As long as the original value would have fit into the target value, this works fine, as for example for the following conversion of a short to a byte:

|     short   |     byte    |
| 00 00 00 00 | 00 00 00 01 | => (short) 1
|             | 00 00 00 01 | => (byte) 1
| 11 11 11 11 | 11 11 11 11 | => (short) -1
|             | 11 11 11 11 | => (byte) -1

但是,如果值太大 太小,这不再工作:

However, if the value is too big or too small, this does not longer work:

|     short   |     byte    |
| 00 00 00 01 | 00 00 00 01 | => (short) 257
|             | 00 00 00 01 | => (byte) 1
| 11 11 11 11 | 00 00 00 00 | => (short) -32512
|             | 00 00 00 00 | => (byte) 0

这就是为什么变窄铸件有时会导致奇怪的结果。你可能想知道为什么缩小是这样实现的。你可以认为,如果JVM检查数字的范围,而不是将不兼容的数字转换为相同符号的最大可表示值,这将更直观。但是,这需要分支什么是昂贵的操作。这是特别重要的,因为此二进制补码符号允许进行廉价的算术运算。

This is why narrowing castings sometimes lead to strange results. You might wonder why narrowing is implemented this way. You could argue that it would be more intuitive if the JVM checked a number's range and would rather cast an incompatible number to the biggest representable value of the same sign. However, this would require branching what is a costly operation. This is specifically important, as this two's complement notation allows for cheap arithmetic operations.

有了这些信息,我们可以看到在你的例子中 -2

With all this information, we can see what happens with the number -2 in your example:

|           int           |    char     |     byte    |
| 11 11 11 11 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | => (int) -2
|                         |             | 11 11 11 10 | => (byte) -2
|                         | 11 11 11 11 | 11 11 11 10 | => (char) \uFFFE
| 00 00 00 00 00 00 00 00 | 11 11 11 11 | 11 11 11 10 | => (int) 65534

正如你所看到的, byte cast是多余的,因为 char 会裁剪相同的位。

As you can see, the byte cast is redundant as the cast to the char would cut the same bits.

由JVMS指定,如果您想要更正式的定义所有这些规则。

All this is also specified by the JVMS, if you prefer a more formal definition of all these rules.

最后一句话:类型的位大小不一定代表JVM保留用于在其内存中表示这种类型的位数。事实上,JVM不区分 boolean byte short char int 类型。所有这些都由相同的JVM类型表示,其中虚拟机只是模拟这些铸件。在方法的操作数栈(即,方法中的任何变量)中,命名类型的所有值都消耗32位。然而,对于任何JVM实现者可以随意处理的数组和对象字段,这不是真的。

One final remark: A type's bit size does not necessarily represent the amount of bits that are reserved by the JVM for representing this type in its memory. As a matter of fact, the JVM does not distinguish between boolean, byte, short, char and int types. All of them are represented by the same JVM-type where the virtual machine merely emulates these castings. On a method's operand stack (i.e. any variable within a method), all values of the named types consumes 32 bits. This is however not true for arrays and object fields which any JVM implementer can handle at will.

这篇关于为什么'(int)(char)(byte)-2'在Java中产生65534?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆