数组对象是否显式包含索引? [英] Does an array object explicitly contain the indexes?

查看:104
本文介绍了数组对象是否显式包含索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从学习Java的第一天起,各个网站和许多老师都告诉我,阵列是连续的内存位置,可以存储所有相同类型的指定数量的数据。



由于数组是一个对象,对象引用存储在堆栈中,而实际对象存在于堆中,因此对象引用指向实际对象。



但是当我遇到如何在内存中创建数组的示例时,它们总是显示如下:



(其中对数组对象的引用存储在堆栈中并且该引用指向堆中的实际对象,其中还有指向特定内存位置的显式索引。





但最近我遇到了



阅读完笔记后,我还在Google上搜索了此事,但​​此问题的内容是要么是模棱两可,要么根本不存在。



我需要对此事进行更多澄清。数组对象索引是否明确显示在内存中?如果没有,那么Java如何在运行期间管理命令转到数组中的特定位置?

解决方案


数组对象是否明确包含索引?


简答:否。



更长的回答:通常没有,但理论上可以做到。



完整答案:



Java语言规范和Java虚拟机规范都没有任何保证内部如何实现数组。所需要的是数组元素由 int 索引号访问,其值从 0 长度-1 。实现如何实际获取或存储这些索引元素的值是实现的私有细节。



完全一致的JVM可以使用哈希表来实现数组。在这种情况下,元素将是非连续的,分散在内存中, 需要记录元素的索引,以了解它们是什么。或者它可以向月球上的人发送信息,他将数组值写在标签纸上并将它们存储在许多小文件柜中。我不明白为什么JVM会想要做这些事情,但它可以。



在实践中会发生什么?典型的JVM将数组元素的存储分配为平坦,连续的内存块。定位特定元素是微不足道的:将每个元素的固定内存大小乘以有用元素的索引,并将其添加到数组开头的内存地址:(index * elementSize)+ startOfArray 。这意味着数组存储只包含原始元素值,连续按索引排序。没有目的也要为每个元素存储索引值,因为内存中元素的地址意味着它的索引,反之亦然。但是,我不认为您显示的图表试图说它显式存储了索引。该图简单地标记了图表上的元素,以便您知道它们是什么。



使用连续存储并通过公式计算元素地址的技术很简单,非常快。它还具有非常小的内存开销,假设程序只将它们的数组分配到它们真正需要的大小。程序依赖于并期望数组的特定性能特征,因此对阵列存储执行奇怪操作的JVM可能表现不佳并且不受欢迎。因此实用 JVM将被限制为实现连续存储,或者执行类似的操作。



我只能想到几个变体有用的方案:


  1. 堆栈分配或寄存器分配的数组:在优化期间,JVM可能通过<确定a href =https://en.wikipedia.org/wiki/Escape_analysis =nofollow>转义分析数组仅在一个方法中使用,如果数组也是一个小的固定大小,它将成为直接在堆栈上分配的理想候选对象,计算相对于堆栈指针的元素的地址。如果数组非常小(固定大小可能最多4个元素),JVM可以更进一步,并将元素直接存储在CPU寄存器中,所有元素访问都展开,并且硬编码。


  2. 打包的布尔数组:计算机上最小的可直接寻址的内存单元通常是8位字节。这意味着如果JVM对每个布尔元素使用一个字节,那么布尔数组会每8位中浪费7个。如果布尔值在内存中打包在一起,那么每个元素只使用1位。这种打包通常不会完成,因为提取单个字节位的速度较慢,并且需要特别考虑使用多线程安全。但是,在一些受内存限制的嵌入式设备中,打包的布尔数组可能非常有意义。


但是,这两种变体都没有要求每个元素都存储自己的索引。



我想解决你提到的其他一些细节:


数组存储指定数量的所有相同类型的数据


正确。



所有数组的元素都是相同的类型这一事实很重要,因为它意味着所有元素在内存中都是相同的 size 。这就是通过简单地乘以它们的公共大小来定位元素的原因。



如果数组元素类型是引用类型,这在技术上仍然是正确的。虽然在这种情况下,每个元素的值不是对象本身(可能具有不同的大小),而只是一个引用对象的地址。此外,在这种情况下,数组的每个元素引用的实际运行时类型的对象可以是元素类型的任何子类。例如,

 对象[] a =新对象[4]; //元素类型为Object的数组
//元素0是对String的引用(它是Object的子类)
a [0] =foo;

//元素1是对Double的引用(它是Object的子类)
a [1] = 123.45;

//元素2是值null(没有对象!虽然null仍可分配给Object类型)
a [2] = null;

//元素3是对另一个数组的引用(所有数组类都是Object的子类)
a [3] = new int [] {2,3,5,7,11} ;




数组是连续的内存位置


如上所述,这不一定是真的,虽然在实践中几乎肯定是真的。



进一步说明,虽然JVM可能会从操作系统中分配一块连续的内存,但这并不意味着它最终会在物理RAM 中连续存在。操作系统可以为程序提供虚拟地址空间,其行为就像是连续的,但具有单独的内存页面分散在各种地方,包括物理RAM,在磁盘上交换文件,或者如果已知其内容当前为空,则根据需要重新生成。即使虚拟内存空间的页面驻留在物理RAM中,它们也可以以任意顺序排列在物理RAM中,复杂的页表定义了从虚拟地址到物理地址的映射。即使操作系统认为它处理物理RAM,它仍然可以在模拟器中运行。我可以分层次地分层叠层,然后到达它们的底部所有找出真正发生的事情需要一段时间!



编程语言规范的部分目的是将明显的行为实施细节。在编程时,您通常可以单独编程到规范,而不必担心内部如何发生。然而,当您需要处理有限速度和内存的实际约束时,实现细节变得相关。


自阵列是一个对象,对象引用存储在堆栈中,实际对象存在于堆中,对象引用指向实际对象


这是正确的,除了你所说的堆栈。对象引用可以存储在堆栈中(作为局部变量),但是它们可以存储为静态字段或实例字段,或者作为数组元素存储在示例中上面。



另外,正如我前面提到的,聪明的实现有时可以直接在堆栈或CPU寄存器中分配对象作为优化,尽管这对你的程序没有影响。明显的行为,只有它的性能。


编译器只是通过查看运行时提供的数组索引号来知道去哪里。


在Java中,执行此操作的不是编译器,而是虚拟机。数组是 JVM本身的一个功能,因此编译器可以将使用数组的源代码转换为使用数组的字节码。然后是JVM决定如何实现数组的工作,编译器既不知道也不关心它们是如何工作的。


Since day one of learning Java I've been told by various websites and many teachers that arrays are consecutive memory locations which can store the specified number of data all of the same type.

Since an array is an object and object references are stored on the stack, and actual objects live in the heap, object references point to actual objects.

But when I came across examples of how arrays are created in memory, they always show something like this:

(In which a reference to an array object is stored on the stack and that reference points to the actual object in the heap, where there are also explicit indexes pointing to specific memory locations)

But recently I came across online notes of Java in which they stated that arrays' explicit indexes are not specified in the memory. The compiler just knows where to go by looking at the provided array index number during runtime.

Just like this:

After reading the notes, I also searched on Google regarding this matter, but the contents on this issue were either quite ambiguous or non-existent.

I need more clarification on this matter. Are array object indexes explicitly shown in memory or not? If not, then how does Java manage the commands to go to a particular location in an array during runtime?

解决方案

Does an array object explicitly contain the indexes?

Short answer: No.

Longer answer: Typically not, but it theoretically could do.

Full answer:

Neither the Java Language Specification nor the Java Virtual Machine Specification makes any guarantees about how arrays are implemented internally. All it requires is that array elements are accessed by an int index number having a value from 0 to length-1. How an implementation actually fetches or stores the values of those indexed elements is a detail private to the implementation.

A perfectly conformant JVM could use a hash table to implement arrays. In that case, the elements would be non-consecutive, scattered around memory, and it would need to record the indexes of elements, to know what they are. Or it could send messages to a man on the moon who writes the array values down on labeled pieces of paper and stores them in lots of little filing cabinets. I can't see why a JVM would want to do these things, but it could.

What will happen in practice? A typical JVM will allocate the storage for array elements as a flat, contiguous chunk of memory. Locating a particular element is trivial: multiply the fixed memory size of each element by the index of the wanted element and add that to the memory address of the start of the array: (index * elementSize) + startOfArray. This means that the array storage consists of nothing but raw element values, consecutively, ordered by index. There is no purpose to also storing the index value with each element, because the element's address in memory implies its index, and vice-versa. However, I don't think the diagram you show was trying to say that it explicitly stored the indexes. The diagram is simply labeling the elements on the diagram so you know what they are.

The technique of using contiguous storage and calculating the address of an element by formula is simple and extremely quick. It also has very little memory overhead, assuming programs allocate their arrays only as big as they really need. Programs depend on and expect the particular performance characteristics of arrays, so a JVM that did something weird with array storage would probably perform poorly and be unpopular. So practical JVMs will be constrained to implement contiguous storage, or something that performs similarly.

I can think of only a couple of variations on that scheme that would ever be useful:

  1. Stack-allocated or register-allocated arrays: During optimization, a JVM might determine through escape analysis that an array is only used within one method, and if the array is also a smallish fixed size, it would then be an ideal candidate object for being allocated directly on the stack, calculating the address of elements relative to the stack pointer. If the array is extremely small (fixed size of maybe up to 4 elements), a JVM could go even further and store the elements directly in CPU registers, with all element accesses unrolled & hardcoded.

  2. Packed boolean arrays: The smallest directly addressable unit of memory on a computer is typically an 8-bit byte. That means if a JVM uses a byte for each boolean element, then boolean arrays waste 7 out of every 8 bits. It would use only 1 bit per element if booleans were packed together in memory. This packing isn't done typically because extracting individual bits of bytes is slower, and it needs special consideration to be safe with multithreading. However, packed boolean arrays might make perfect sense in some memory-constrained embedded devices.

Still, neither of those variations requires every element to store its own index.

I want to address a few other details you mentioned:

arrays store the specified number of data all of the same type

Correct.

The fact that all an array's elements are the same type is important because it means all the elements are the same size in memory. That's what allows for elements to be located by simply multiplying by their common size.

This is still technically true if the array element type is a reference type. Although in that case, the value of each element is not the object itself (which could be of varying size) but only an address which refers to an object. Also, in that case, the actual runtime type of objects referred to by each element of the array could be any subclass of the element type. E.g.,

Object[] a = new Object[4]; // array whose element type is Object
// element 0 is a reference to a String (which is a subclass of Object)
a[0] = "foo";

// element 1 is a reference to a Double (which is a subclass of Object)
a[1] = 123.45;

// element 2 is the value null (no object! although null is still assignable to Object type)
a[2] = null;

// element 3 is a reference to another array (all arrays classes are subclasses of Object)
a[3] = new int[] { 2, 3, 5, 7, 11 };

arrays are consecutive memory locations

As discussed above, this doesn't have to be true, although it is almost surely true in practice.

To go further, note that although the JVM might allocate a contiguous chunk of memory from the operating system, that doesn't mean it ends up being contiguous in physical RAM. The OS can give programs a virtual address space that behaves as if contiguous, but with individual pages of memory scattered in various places, including physical RAM, swap files on disk, or regenerated as needed if their contents are known to be currently blank. Even to the extent that pages of the virtual memory space are resident in physical RAM, they could be arranged in physical RAM in an arbitrary order, with complex page tables that define the mapping from virtual to physical addresses. And even if the OS thinks it is dealing with "physical RAM", it still could be running in an emulator. There can be layers upon layers upon layers, is my point, and getting to the bottom of them all to find out what's really going on takes a while!

Part of the purpose of programming language specifications is to separate the apparent behavior from the implementation details. When programming you can often program to the specification alone, free from worrying about how it happens internally. The implementation details become relevant however, when you need to deal with the the real-world constraints of limited speed and memory.

Since an array is an object and object references are stored on the stack, and actual objects live in the heap, object references point to actual objects

This is correct, except what you said about the stack. Object references can be stored on the stack (as local variables), but they can also be stored as static fields or instance fields, or as array elements as seen in the example above.

Also, as I mentioned earlier, clever implementations can sometimes allocate objects directly on the stack or in CPU registers as an optimization, although this has zero effect on your program's apparent behavior, only its performance.

The compiler just knows where to go by looking at the provided array index number during runtime.

In Java, it's not the compiler that does this, but the virtual machine. Arrays are a feature of the JVM itself, so the compiler can translate your source code that uses arrays simply to bytecode that uses arrays. Then it's the JVM's job to decide how to implement arrays, and the compiler neither knows nor cares how they work.

这篇关于数组对象是否显式包含索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆