什么是“内部地址"?在爪哇? [英] What is an "internal address" in Java?

查看:38
本文介绍了什么是“内部地址"?在爪哇?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Object.hashCode() 说明

<块引用>

在合理可行的范围内,Object 类定义的 hashCode 方法确实为不同的对象返回不同的整数.(这通常是通过将对象的内部地址转换为整数来实现的,但 Java™ 编程语言不需要这种实现技术.)

这是一个常见的误解,这与内存地址有关,但它不会在没有通知的情况下更改,并且 hashCode() 不会也不得更改对象.

@Neet 提供了一个很好答案的链接 https://stackoverflow.com/a/565416/57695 但是我正在寻找更多详细信息.

<小时>

这是一个例子来说明我的担忧

Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");theUnsafe.setAccessible(true);Unsafe unsafe = (Unsafe) theUnsafe.get(null);for (int t = 0; t <10; t++) {System.gc();对象[]对象=新对象[10];for (int i = 0; i  0) System.out.print(", ");int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i);System.out.printf("%08x: hc= %08x", location, objects[i].hashCode());}System.out.println();}

印刷品

<预> <代码> eac00038:HC = 4f47e0ba,eac00048:HC = 2342d884,eac00058:HC = 7994d431,eac00068:HC = 19f71b53,eac00078:HC = 2e22f376,eac00088:HC = 789ddfa3,eac00098:HC = 44c58432,eac000a8:hc= 036a11e4, eac000b8: hc= 28bc917c, eac000c8: hc= 73f378c8eac00038:HC = 30813486,eac00048:HC = 729f624a,eac00058:HC = 3dee2310,eac00068:HC = 5d400f33,eac00078:HC = 18a60d19,eac00088:HC = 3da5f0f3,eac00098:HC = 596e0123,eac000a8:HC = 450cceb3,eac000b8:hc= 4bd66d2f, eac000c8: hc= 6a9a4f8eeac00038:HC = 711dc088,eac00048:HC = 584b5abc,eac00058:HC = 3b3219ed,eac00068:HC = 564434f7,eac00078:HC = 17f17060,eac00088:HC = 6c08bae7,eac00098:HC = 3126cb1a,eac000a8:HC = 69e0312b,eac000b8:hc= 7dbc345a,eac000c8:hc= 4f114133eac00038:HC = 50c8c3b8,eac00048:HC = 2ca98e77,eac00058:HC = 2fc83d89,eac00068:HC = 034005e1,eac00078:HC = 6041f871,eac00088:HC = 0b1df416,eac00098:HC = 5b83d60d,eac000a8:HC = 2c5a1e6b,eac000b8:hc= 5083198c, eac000c8: hc= 4f025f9feac00038:HC = 00c5eb8a,eac00048:HC = 41eab16b,eac00058:HC = 1726099c,eac00068:HC = 4240eca3,eac00078:HC = 346fe350,eac00088:HC = 1db4b415,eac00098:HC = 429addef,eac000a8:HC = 45609812,eac000b8:hc= 489fe953, eac000c8: hc= 7a8f6d64eac00038:HC = 7e628e42,eac00048:HC = 7869cfe0,eac00058:HC = 6aceb8e2,eac00068:HC = 29cc3436,eac00078:HC = 1d77daaa,eac00088:HC = 27b4de03,eac00098:HC = 535bab52,eac000a8:HC = 274cbf3f,eac000b8:hc= 1f9fd541, eac000c8: hc= 3669ae9feac00038:HC = 772a3766,eac00048:HC = 749b46a8,eac00058:HC = 7e3bfb66,eac00068:HC = 13f62649,eac00078:HC = 054b8cdc,eac00088:HC = 230cc23b,eac00098:HC = 1aa3c177,eac000a8:HC = 74f2794a,eac000b8:hc= 5af92541, eac000c8: hc= 1afcfd10eac00038:HC = 396e1dd8,eac00048:HC = 6c696d5c,eac00058:HC = 7d8aea9e,eac00068:HC = 2b316b76,eac00078:HC = 39862621,eac00088:HC = 16315e08,eac00098:HC = 03146a9a,eac000a8:HC = 3162a60a,eac000b8:hc= 4382f3da, eac000c8: hc= 4a578fd6eac00038:HC = 225765b0,eac00048:HC = 17d5176d,eac00058:HC = 26f50154,eac00068:HC = 1f2a45c7,eac00078:HC = 104b1bcd,eac00088:HC = 330e3816,eac00098:HC = 6a844689,eac000a8:HC = 12330301,eac000b8:hc= 530a3ffc,eac000c8:hc= 45eee3fbeac00038:HC = 3f9432e0,eac00048:HC = 1a9830bc,eac00058:HC = 7da79447,eac00068:HC = 04f801c4,eac00078:HC = 363bed68,eac00088:HC = 185f62a9,eac00098:HC = 1e4651bf,eac000a8:HC = 1aa0e220,eac000b8:hc= 385db088, eac000c8: hc= 0ef0cda1

<小时>

作为旁注;如果你看这段代码

if (value == 0) value = 0xBAD ;

看起来 0xBAD 的可能性是任何 hashCode 的两倍,因为 0 被映射到这个值.如果你运行的时间足够长,你会看到

long count = 0, countBAD = 0;而(真){for (int i = 0; i <200000000; i++) {int hc = new Object().hashCode();如果(hc == 0xBAD)计数坏++;计数++;}System.out.println("0xBAD 比率是" + (double) (countBAD << 32)/count + " 次预期.");}

印刷品

0xBAD 比率是预期的 2.0183116992481205 倍.

解决方案

这显然是特定于实现的.

下面我包含了 OpenJDK 7 中使用的 Object.hashCode() 实现.

该函数支持六种不同的计算方法,其中只有两种会注意对象的地址(地址"是 C++ oop 转换为 intptr_t).这两种方法中的一种使用原样的地址,而另一种方法进行一些处理,然后将结果与不经常更新的随机数混合.

其余方法中,一个返回一个常量(大概是为了测试),一个返回序列号,其余的都是基于伪随机序列.

似乎可以在运行时选择该方法,默认似乎是方法0,即os::random().后者是一个 线性同余生成器,其中包含了所谓的竞争条件.:-) 竞争条件是可以接受的,因为在最糟糕的是,这会导致两个对象共享相同的哈希码;这不会破坏任何不变量.

第一次需要哈希码时执行计算.为了保持一致性,结果随后存储在对象的标头中,并在后续调用 hashCode() 时返回.缓存是在这个函数之外完成的.

总而言之,Object.hashCode() 基于对象地址的概念在很大程度上是一个历史产物,已被现代垃圾收集器的特性淘汰.

//hotspot/src/share/vm/runtime/synchronizer.hpp//hashCode() 生成:////可能性://* {obj,stwRandom} 的 MD5Digest//* {obj,stwRandom} 的 CRC32 或任何线性反馈移位寄存器函数.//* DES 或 AES 风格的 SBox[] 机制//* 基于 Phi 的方案之一,例如://2654435761 = 2^32 * Phi(黄金比例)//HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;//* Marsaglia 的 shift-xor RNG 方案的变体.//* (obj ^ stwRandom) 很吸引人,但可能会导致//相邻对象的 hashCode 值的不合乎需要的规律性//(特别是背靠背分配的对象).这可能有可能//导致哈希表冲突并降低哈希表效率.//有一些简单的方法可以将中间地址位扩散"到//生成的 hashCode 值://静态内联 intptr_t get_next_hash(Thread * Self, oop obj) {intptr_t 值 = 0 ;如果(哈希码 == 0){//这种形式使用一个无人看守的全局 Park-Miller RNG,//因此两个线程可以竞争并生成相同的 RNG.//在 MP 系统上,我们将有很多 RW 访问全局,所以//机制会导致大量的一致性流量.值 = os::random() ;} 别的如果(哈希码 == 1){//这种变体具有稳定(幂等)的特性//STW 操作之间.这在某些 1-0//同步方案.intptr_t addrBits = intptr_t(obj) >>3 ;值 = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;} 别的如果(哈希码 == 2){值 = 1 ;//灵敏度测试} 别的如果(哈希码 == 3){值 = ++GVars.hcSequence ;} 别的如果(哈希码 == 4){值 = intptr_t(obj) ;} 别的 {//Marsaglia 的具有线程特定状态的异或移位方案//这可能是最好的整体实现——我们会//可能会在以后的版本中将其设为默认值.无符号 t = Self->_hashStateX ;t^=(t<<11);Self->_hashStateX = Self->_hashStateY ;Self->_hashStateY = Self->_hashStateZ ;Self->_hashStateZ = Self->_hashStateW ;无符号 v = Self->_hashStateW ;v = (v^(v>>19))^(t^(t>>8));Self->_hashStateW = v ;价值 = v ;}value &= markOopDesc::hash_mask;如果(值 == 0)值 = 0xBAD ;assert (value != markOopDesc::no_hash, "invariant") ;TEVENT (hashCode: GENERATE) ;返回值;}

In the Javadoc for Object.hashCode() it states

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

It's a common miconception this has something to do with the memory address but it doesn't as that can change without notice and the hashCode() does not and must not change for an object.

@Neet Provided a link to a good answer https://stackoverflow.com/a/565416/57695 but I am looking for more details.


Here is an example to illustrate my concern

Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);

for (int t = 0; t < 10; t++) {
    System.gc();
    Object[] objects = new Object[10];
    for (int i = 0; i < objects.length; i++)
        objects[i] = new Object();

    for (int i = 0; i < objects.length; i++) {
        if (i > 0) System.out.print(", ");
        int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i);
        System.out.printf("%08x: hc= %08x", location, objects[i].hashCode());
    }
    System.out.println();
}

prints

eac00038: hc= 4f47e0ba, eac00048: hc= 2342d884, eac00058: hc= 7994d431, eac00068: hc= 19f71b53, eac00078: hc= 2e22f376, eac00088: hc= 789ddfa3, eac00098: hc= 44c58432, eac000a8: hc= 036a11e4, eac000b8: hc= 28bc917c, eac000c8: hc= 73f378c8
eac00038: hc= 30813486, eac00048: hc= 729f624a, eac00058: hc= 3dee2310, eac00068: hc= 5d400f33, eac00078: hc= 18a60d19, eac00088: hc= 3da5f0f3, eac00098: hc= 596e0123, eac000a8: hc= 450cceb3, eac000b8: hc= 4bd66d2f, eac000c8: hc= 6a9a4f8e
eac00038: hc= 711dc088, eac00048: hc= 584b5abc, eac00058: hc= 3b3219ed, eac00068: hc= 564434f7, eac00078: hc= 17f17060, eac00088: hc= 6c08bae7, eac00098: hc= 3126cb1a, eac000a8: hc= 69e0312b, eac000b8: hc= 7dbc345a, eac000c8: hc= 4f114133
eac00038: hc= 50c8c3b8, eac00048: hc= 2ca98e77, eac00058: hc= 2fc83d89, eac00068: hc= 034005e1, eac00078: hc= 6041f871, eac00088: hc= 0b1df416, eac00098: hc= 5b83d60d, eac000a8: hc= 2c5a1e6b, eac000b8: hc= 5083198c, eac000c8: hc= 4f025f9f
eac00038: hc= 00c5eb8a, eac00048: hc= 41eab16b, eac00058: hc= 1726099c, eac00068: hc= 4240eca3, eac00078: hc= 346fe350, eac00088: hc= 1db4b415, eac00098: hc= 429addef, eac000a8: hc= 45609812, eac000b8: hc= 489fe953, eac000c8: hc= 7a8f6d64
eac00038: hc= 7e628e42, eac00048: hc= 7869cfe0, eac00058: hc= 6aceb8e2, eac00068: hc= 29cc3436, eac00078: hc= 1d77daaa, eac00088: hc= 27b4de03, eac00098: hc= 535bab52, eac000a8: hc= 274cbf3f, eac000b8: hc= 1f9fd541, eac000c8: hc= 3669ae9f
eac00038: hc= 772a3766, eac00048: hc= 749b46a8, eac00058: hc= 7e3bfb66, eac00068: hc= 13f62649, eac00078: hc= 054b8cdc, eac00088: hc= 230cc23b, eac00098: hc= 1aa3c177, eac000a8: hc= 74f2794a, eac000b8: hc= 5af92541, eac000c8: hc= 1afcfd10
eac00038: hc= 396e1dd8, eac00048: hc= 6c696d5c, eac00058: hc= 7d8aea9e, eac00068: hc= 2b316b76, eac00078: hc= 39862621, eac00088: hc= 16315e08, eac00098: hc= 03146a9a, eac000a8: hc= 3162a60a, eac000b8: hc= 4382f3da, eac000c8: hc= 4a578fd6
eac00038: hc= 225765b0, eac00048: hc= 17d5176d, eac00058: hc= 26f50154, eac00068: hc= 1f2a45c7, eac00078: hc= 104b1bcd, eac00088: hc= 330e3816, eac00098: hc= 6a844689, eac000a8: hc= 12330301, eac000b8: hc= 530a3ffc, eac000c8: hc= 45eee3fb
eac00038: hc= 3f9432e0, eac00048: hc= 1a9830bc, eac00058: hc= 7da79447, eac00068: hc= 04f801c4, eac00078: hc= 363bed68, eac00088: hc= 185f62a9, eac00098: hc= 1e4651bf, eac000a8: hc= 1aa0e220, eac000b8: hc= 385db088, eac000c8: hc= 0ef0cda1


As a side note; If you look at this code

if (value == 0) value = 0xBAD ;

It appears that 0xBAD is twice as likely as normal as any hashCode as 0 is mapped to this value. If you run this long enough you see

long count = 0, countBAD = 0;
while (true) {
    for (int i = 0; i < 200000000; i++) {
        int hc = new Object().hashCode();
        if (hc == 0xBAD)
            countBAD++;
        count++;
    }
    System.out.println("0xBAD ratio is " + (double) (countBAD << 32) / count + " times expected.");
}

prints

0xBAD ratio is 2.0183116992481205 times expected.

解决方案

This is clearly implementation-specific.

Below I include the Object.hashCode() implementation used in OpenJDK 7.

The function supports six different calculation methods, only two of which take any notice of the object's address (the "address" being the C++ oop cast to intptr_t). One of the two methods uses the address as-is, whereas the other does some bit twiddling and then mashes the result with an infrequently-updated random number.

Of the remaining methods, one returns a constant (presumably for testing), one returns sequential numbers, and the rest are based on pseudo-random sequences.

It would appear that the method can be chosen at runtime, and the default seems to be method 0, which is os::random(). The latter is a linear congruential generator, with an alleged race condition thrown in. :-) The race condition is acceptable because at worst it would result in two objects sharing the same hash code; this does not break any invariants.

The computation is performed the first time a hash code is required. To maintain consistency, the result is then stored in the object's header and is returned on subsequent calls to hashCode(). The caching is done outside this function.

In summary, the notion that Object.hashCode() is based on the object's address is largely a historic artefact that has been obsoleted by the properties of modern garbage collectors.

// hotspot/src/share/vm/runtime/synchronizer.hpp

// hashCode() generation :
//
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
//   2654435761 = 2^32 * Phi (golden ratio)
//   HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
//   in undesirable regularity in the hashCode values of adjacent objects
//   (objects allocated back-to-back, in particular).  This could potentially
//   result in hashtable collisions and reduced hashtable efficiency.
//   There are simple ways to "diffuse" the middle address bits over the
//   generated hashCode values:
//

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

这篇关于什么是“内部地址"?在爪哇?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆