为什么不是Java有真正的多维数组? [英] Why doesn't Java have true multidimensional arrays?

查看:149
本文介绍了为什么不是Java有真正的多维数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该TL; DR版,对于那些不想背景做谁,是以下具体问题:

问题


  

为什么不是Java有真正的多维数组的实现?是否有一个坚实的技术原因是什么?缺少什么我在这里?


背景

Java有在语法层面多维数组,在那一个可以声明

  INT [] [] =改编新INT [10] [10];

但似乎这实在不是什么人可能会认为。而不是让JVM分配的内存足够大的存储100 INT 个连续的块,它出来为 INT 取值:让每一层的RAM连续块,但事情作为一个整体不是。访问改编[I] [J] 因此相当缓慢:在JVM有


  1. 找到 INT [] 保存在改编[I] ;

  2. 此指数查找 INT 保存在改编[I] [J]

此涉及查询一个对象从一层转到下,这是相当昂贵的。

为什么Java的做到这一点

在一个层面上,不难看出为什么这不能优化,即使它是在一个固定块分配一个简单的规模和相加查找。问题是,改编[3] 是所有它自己的参考,并且它可以被改变。因此,尽管阵列是固定的大小,我们可以很容易地写

 改编[3] = INT新[11];

和现在的尺度和相加被拧因为这层增加了。你需要知道在运行时是否一切依旧,因为它使用的是相同的大小。此外,当然,这随后会分配别的地方RAM(它得是,因为它比它的替代大),所以它甚至没有在规模和相加正确的地方。

什么是关于它的问题

在我看来,这是不理想,而且有两个原因。

首先,它是的的。测试我使用这些方法的内容,总结一维或多维数组的跑了的将近两倍的时间的(714秒VS371秒)为多维的情况下(在 INT [百万] INT [100] [100] [100] 分别装有随机 INT 值,运行100万倍用温高速缓存)。

 公共静态长sumSingle(INT [] ARR){
    总长= 0;
    的for(int i = 0; I< arr.length;我++)
        总+ =改编[I]
    总回报;
}公共静态长sumMulti(INT [] [] [] ARR){
    总长= 0;
    的for(int i = 0; I< arr.length;我++)
        对于(INT J = 0; J<常用3 [0]。长度; J ++)
            为(中间体K = 0; K&下;常用3 [0] [0]。长度; k ++)
                总+ =改编[I] [J] [K];
    总回报;
}

其次,因为它是缓慢的,因此,它的鼓励晦涩编码的。如果遇到一些性能关键,将有一个多维数组自然地做,你有动力把它写成一个平面阵列,即使使不自然,难以阅读。你留下了一个令人不快的选择:晦涩code或慢code

什么能对此做些

在我看来,基本的问题很容易够固定。唯一的原因,正如我们前面看到的,它不能被优化是结构可能会发生变化。但Java已经具备了进行引用不可改变的机制:其声明为最后

现在,只需用它声明

 最终诠释[] [] =改编新INT [10] [10];

是不够的,因为它是唯一的改编最后这里: ARR [3] 还是不是,可以改变,所以结构仍可能发生改变。但是,如果我们不得不宣布的事情,所以,这是最后贯穿始终,除了在底层,其中 INT 的一种方式值存储,那么我们就会有一个完整的结构不变,这都可能被分配作为一个块,并与规模和相加索引。

怎么会看语法,我不知道(我不是语言设计者)。也许

 最终诠释[决赛] [] =改编新INT [10] [10];

但无可否认,看起来有点怪。这意味着:最后顶部层; 最后下一层;没有最后在底层(否则 INT 值本身是不可变的)。

终局整个将使JIT编译器优化这给性能的一维数组,那么这将带走诱惑的code这种方式只是为了避开多维数组的缓慢。

(听说C#做了这样的传闻,虽然我也听到另一种谣言,CLR实现是如此糟糕,这是不值得拥有......也许他们只是谣言...)

问题


  

那么,为什么不是Java有真正的多维数组的实现?是否有一个坚实的技术原因是什么?缺少什么我在这里?


更新

一个奇怪的侧面说明:在计时的差异降到客场只有百分之几,如果你使用 INT 的运行总计而不是。为什么会有这样的差小的 INT ,并用这么大的差异

标杆code

code我用标杆,万一有人想尝试重现这些结果:

 公共类多维{    公共静态长sumSingle(最终诠释[] ARR){
        总长= 0;
        的for(int i = 0; I< arr.length;我++)
            总+ =改编[I]
        总回报;
    }    公共静态长sumMulti(最终诠释[] [] [] ARR){
        总长= 0;
        的for(int i = 0; I< arr.length;我++)
            对于(INT J = 0; J<常用3 [0]。长度; J ++)
                为(中间体K = 0; K&下;常用3 [0] [0]。长度; k ++)
                    总+ =改编[I] [J] [K];
        总回报;
    }    公共静态无效的主要(字串[] args){
        最终诠释迭代= 1000000;        随机R =新的随机();
        INT [] =改编新INT [百万]
        的for(int i = 0; I< arr.length;我++)
            改编[I] = r.nextInt();
        总长= 0;
        的System.out.println(sumSingle(ARR));
        很长一段时间= System.nanoTime();
        的for(int i = 0; I<迭代;我++)
            总= sumSingle(ARR);
        时间= System.nanoTime() - 时间;
        System.out.printf(用了%d毫秒为单一维度\\ n,时间/ 1000000总数);        INT [] [] [] arrMulti =新INT [100] [100] [100];
        的for(int i = 0; I< arrMulti.length;我++)
            对于(INT J = 0; J< arrMulti [I]。长度; J ++)
                对于(INT K = 0; K< arrMulti [I] [J]。长度; k ++)
                    arrMulti [I] [J] [K] = r.nextInt();
        的System.out.println(sumMulti(arrMulti));
        时间= System.nanoTime();
        的for(int i = 0; I<迭代;我++)
            总= sumMulti(arrMulti);
        时间= System.nanoTime() - 时间;
        System.out.printf(用了%d毫秒多维度\\ n,时间/ 1000000总数);
    }}


解决方案

  

但似乎这实在不是什么人可能会认为。


为什么?

认为形式 T [] 意思是T型数组,那么正如我们所期望的 INT [] 的意思是int类型的数组,我们希望 INT [] [] 的意思是int类型的数组类型的数组,因为没有少的原因对于具有 INT [] T INT

因此​​,考虑到人们可以有任何类型的数组,它遵循距离的方式 [] 在声明和初始化数组(为此事, {} 使用),如果没有某种特殊规则禁止数组的数组中,我们得到这种利用免费。

现在想想也认为有许多事情我们可以交错数组做到这一点,我们不能这样做,否则:


  1. 我们可以有锯齿阵列,其中不同的内部数组的大小不同。

  2. 我们可以在外数组,其中数据的相应的映射,或者让懒惰的大楼内有空数组。

  3. 我们可以在阵列内故意别名所以例如查找[1] 是在同一个阵列为查找[5] 。 (这可以允许与某些数据集,例如,许多的Uni code属性可以被映射为在存储器少量的全套1112064 code点,因为属性的叶阵列可重复大规模积蓄有匹配模式的范围)。

  4. 某些堆实现可以处理许多更小的物体比一个大对象在内存中。

有肯定的地方,这些排序多维数组是有用的情况。

现在,任何功能的默认状态是不确定的,并且未实现。有人需要决定指定和执行功能,否则就不会存在。

一直以来,如上图所示,那种多维数组的数组的数组将存在,除非有人决定引进禁止数组的数组的特殊功能。由于数组的数组是上述原因很有用,这将是一个奇怪的决定。

相反地,排序多维阵列,其中一个阵列具有一个限定秩,可以是大于1,因此与一组索引,而不是一个单一的索引的使用,不自然从什么已定义遵循。有人将需要:


  1. 决定规范的声明,初始化和使用是可行的。

  2. 文件就可以了。

  3. 写的实际code做到这一点。

  4. 测试code做到这一点。

  5. 处理的错误,边缘的情况下,这实际上不是错误,造成修复的错误向后兼容性问题的bug报告。

此外,用户还必须学习这一新功能。

所以,它必须是值得的。一些事情,会使它的价值将是:


  1. 如果没有做同样的事情的方式。

  2. 如果做同样的事情的方式很奇怪或不出名。

  3. 人们会期望从类似的上下文。

  4. 的用户无法提供类似的功能本身。

在这种情况下,虽然:


  1. 但是有。

  2. 阵列内使用的进步已经知道到C和C ++程序员和Java建立在它的语法,使得相同的技术可直接适用

  3. Java的语法是基于C ++和C ++同样只为数组 - 阵列中的多维数组的直接支持。 (除了当静态分配,但是这不是说一定要用Java打个比方,其中数组是对象)。

  4. 一个可以轻松编写一个包装了阵列和跨步尺寸的详细信息,并允许通过一组指标的访问类。

真的,问题不在于为什么不是Java拥有真正的多维数组?但是,为什么呢?

当然,你赞成多维数组中提出的各点是有效的,有些语言确实有他们的原因,但负担仍然争辩功能的,不争论不出来。


  

(听说C#做了这样的传闻,虽然我也听到另一种谣言,CLR实现是如此糟糕,这是不值得拥有......也许他们只是谣言...)


像许多传闻,还有这里的真实的因素,但它不是全部真相。

.NET数组确实可以有多个行列。这不是在它比Java更灵活的唯一途径。每个等级还可以有比零下限等。这样,可以例如具有去从-3到42或二维阵列,其中一个秩变为从-2至5和阵列另一个从57到100,或任何

C#不给这一切从完全访问其内置的语法(你需要调用 Array.CreateInstance()比其他零下限)但它确实为允许您使用语法 INT [,] INT 的二维数组, INT [,,] 的三维数组,等等。

现在,参与处理不是零下限额外的工作增加了性能负担,但这些情况都比较少见。出于这个原因为0的约束低级单列阵列被视为与更高性能的实施的一个特例。事实上,他们是内部一种不同的结构。

在.NET多维数组为零下界被视为多维数组,其下限恰好是零(即,为较慢情况的一个例子),而不是快的情况下,能够手柄居大于1。

当然,.NET的可以者有从零开始的多维数组快速路径的情况下,但随后对Java的所有原因没有将其应用于的事实,有已经一种特殊情况和特殊情况下吮吸,然后就会有两个特殊的情况下,他们会更闹心。 (正因为如此,我们可以有一些问题,试图将一种类型的值赋给其他类型的变量)。

上面没有一个单一的东西清楚地表明了Java不可能有那种你说的多维数组的;这本来是一个明智的决定不够,但如此也被作出的决定也是明智的。

The TL;DR version, for those who don't want the background, is the following specific question:

Question

Why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

Background

Java has multidimensional arrays at the syntax level, in that one can declare

int[][] arr = new int[10][10];

but it seems that this is really not what one might have expected. Rather than having the JVM allocate a contiguous block of RAM big enough to store 100 ints, it comes out as an array of arrays of ints: so each layer is a contiguous block of RAM, but the thing as a whole is not. Accessing arr[i][j] is thus rather slow: the JVM has to

  1. find the int[] stored at arr[i];
  2. index this to find the int stored at arr[i][j].

This involves querying an object to go from one layer to the next, which is rather expensive.

Why Java does this

At one level, it's not hard to see why this can't be optimised to a simple scale-and-add lookup even if it were all allocated in one fixed block. The problem is that arr[3] is a reference all of its own, and it can be changed. So although arrays are of fixed size, we could easily write

arr[3] = new int[11];

and now the scale-and-add is screwed because this layer has grown. You'd need to know at runtime whether everything is still the same size as it used to be. In addition, of course, this will then get allocated somewhere else in RAM (it'll have to be, since it's bigger than what it's replacing), so it's not even in the right place for scale-and-add.

What's problematic about it

It seems to me that this is not ideal, and that for two reasons.

For one, it's slow. A test I ran with these methods for summing the contents of a single dimensional or multidimensional array took nearly twice as long (714 seconds vs 371 seconds) for the multidimensional case (an int[1000000] and an int[100][100][100] respectively, filled with random int values, run 1000000 times with warm cache).

public static long sumSingle(int[] arr) {
    long total = 0;
    for (int i=0; i<arr.length; i++)
        total+=arr[i];
    return total;
}

public static long sumMulti(int[][][] arr) {
    long total = 0;
    for (int i=0; i<arr.length; i++)
        for (int j=0; j<arr[0].length; j++)
            for (int k=0; k<arr[0][0].length; k++)
                total+=arr[i][j][k];
    return total;
}   

Secondly, because it's slow, it thereby encourages obscure coding. If you encounter something performance-critical that would be naturally done with a multidimensional array, you have an incentive to write it as a flat array, even if that makes the unnatural and hard to read. You're left with an unpalatable choice: obscure code or slow code.

What could be done about it

It seems to me that the basic problem could easily enough be fixed. The only reason, as we saw earlier, that it can't be optimised is that the structure might change. But Java already has a mechanism for making references unchangeable: declare them as final.

Now, just declaring it with

final int[][] arr = new int[10][10];

isn't good enough because it's only arr that is final here: arr[3] still isn't, and could be changed, so the structure might still change. But if we had a way of declaring things so that it was final throughout, except at the bottom layer where the int values are stored, then we'd have an entire immutable structure, and it could all be allocated as one block, and indexed with scale-and-add.

How it would look syntactically, I'm not sure (I'm not a language designer). Maybe

final int[final][] arr = new int[10][10];

although admittedly that looks a bit weird. This would mean: final at the top layer; final at the next layer; not final at the bottom layer (else the int values themselves would be immutable).

Finality throughout would enable the JIT compiler to optimise this to give performance to that of a single dimensional array, which would then take away the temptation to code that way just to get round the slowness of multidimensional arrays.

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

Question

So why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

Update

A bizarre side note: the difference in timings drops away to only a few percent if you use an int for the running total rather than a long. Why would there be such a small difference with an int, and such a big difference with a long?

Benchmarking code

Code I used for benchmarking, in case anyone wants to try to reproduce these results:

public class Multidimensional {

    public static long sumSingle(final int[] arr) {
        long total = 0;
        for (int i=0; i<arr.length; i++)
            total+=arr[i];
        return total;
    }

    public static long sumMulti(final int[][][] arr) {
        long total = 0;
        for (int i=0; i<arr.length; i++)
            for (int j=0; j<arr[0].length; j++)
                for (int k=0; k<arr[0][0].length; k++)
                    total+=arr[i][j][k];
        return total;
    }   

    public static void main(String[] args) {
        final int iterations = 1000000;

        Random r = new Random();
        int[] arr = new int[1000000];
        for (int i=0; i<arr.length; i++)
            arr[i]=r.nextInt();
        long total = 0;
        System.out.println(sumSingle(arr));
        long time = System.nanoTime();
        for (int i=0; i<iterations; i++)
            total = sumSingle(arr);
        time = System.nanoTime()-time;
        System.out.printf("Took %d ms for single dimension\n", time/1000000, total);

        int[][][] arrMulti = new int[100][100][100];
        for (int i=0; i<arrMulti.length; i++)
            for (int j=0; j<arrMulti[i].length; j++)
                for (int k=0; k<arrMulti[i][j].length; k++)
                    arrMulti[i][j][k]=r.nextInt();
        System.out.println(sumMulti(arrMulti));
        time = System.nanoTime();
        for (int i=0; i<iterations; i++)
            total = sumMulti(arrMulti);
        time = System.nanoTime()-time;
        System.out.printf("Took %d ms for multi dimension\n", time/1000000, total);
    }

}

解决方案

but it seems that this is really not what one might have expected.

Why?

Consider that the form T[] means "array of type T", then just as we would expect int[] to mean "array of type int", we would expect int[][] to mean "array of type array of type int", because there's no less reason for having int[] as the T than int.

As such, considering that one can have arrays of any type, it follows just from the way [ and ] are used in declaring and initialising arrays (and for that matter, {, } and ,), that without some sort of special rule banning arrays of arrays, we get this sort of use "for free".

Now consider also that there are things we can do with jagged arrays that we can't do otherwise:

  1. We can have "jagged" arrays where different inner arrays are of different sizes.
  2. We can have null arrays within the outer array where appropriate mapping of the data, or perhaps to allow lazy building.
  3. We can deliberately alias within the array so e.g. lookup[1] is the same array as lookup[5]. (This can allow for massive savings with some data-sets, e.g. many Unicode properties can be mapped for the full set of 1,112,064 code points in a small amount of memory because leaf arrays of properties can be repeated for ranges with matching patterns).
  4. Some heap implementations can handle the many smaller objects better than one large object in memory.

There are certainly cases where these sort of multi-dimensional arrays are useful.

Now, the default state of any feature is unspecified and unimplemented. Someone needs to decide to specify and implement a feature, or else it wouldn't exist.

Since, as shown above, the array-of-array sort of multidimensional array will exist unless someone decided to introduce a special banning array-of-array feature. Since arrays of arrays are useful for the reasons above, that would be a strange decision to make.

Conversely, the sort of multidimensional array where an array has a defined rank that can be greater than 1 and so be used with a set of indices rather than a single index, does not follow naturally from what is already defined. Someone would need to:

  1. Decide on the specification for the declaration, initialisation and use would work.
  2. Document it.
  3. Write the actual code to do this.
  4. Test the code to do this.
  5. Handle the bugs, edge-cases, reports of bugs that aren't actually bugs, backward-compatibility issues caused by fixing the bugs.

Also users would have to learn this new feature.

So, it has to be worth it. Some things that would make it worth it would be:

  1. If there was no way of doing the same thing.
  2. If the way of doing the same thing was strange or not well-known.
  3. People would expect it from similar contexts.
  4. Users can't provide similar functionality themselves.

In this case though:

  1. But there is.
  2. Using strides within arrays was already known to C and C++ programmers and Java built on its syntax so that the same techniques are directly applicable
  3. Java's syntax was based on C++, and C++ similarly only has direct support for multidimensional arrays as arrays-of-arrays. (Except when statically allocated, but that's not something that would have an analogy in Java where arrays are objects).
  4. One can easily write a class that wraps an array and details of stride-sizes and allows access via a set of indices.

Really, the question is not "why doesn't Java have true multidimensional arrays"? But "Why should it?"

Of course, the points you made in favour of multidimensional arrays are valid, and some languages do have them for that reason, but the burden is nonetheless to argue a feature in, not argue it out.

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

Like many rumours, there's an element of truth here, but it is not the full truth.

.NET arrays can indeed have multiple ranks. This is not the only way in which it is more flexible than Java. Each rank can also have a lower-bound other than zero. As such, you could for example have an array that goes from -3 to 42 or a two dimensional array where one rank goes from -2 to 5 and another from 57 to 100, or whatever.

C# does not give complete access to all of this from its built-in syntax (you need to call Array.CreateInstance() for lower bounds other than zero), but it does for allow you to use the syntax int[,] for a two-dimensional array of int, int[,,] for a three-dimensional array, and so on.

Now, the extra work involved in dealing with lower bounds other than zero adds a performance burden, and yet these cases are relatively uncommon. For that reason single-rank arrays with a lower-bound of 0 are treated as a special case with a more performant implementation. Indeed, they are internally a different sort of structure.

In .NET multi-dimensional arrays with lower bounds of zero are treated as multi-dimensional arrays whose lower bounds just happen to be zero (that is, as an example of the slower case) rather than the faster case being able to handle ranks greater than 1.

Of course, .NET could have had a fast-path case for zero-based multi-dimensional arrays, but then all the reasons for Java not having them apply and the fact that there's already one special case, and special cases suck, and then there would be two special cases and they would suck more. (As it is, one can have some issues with trying to assign a value of one type to a variable of the other type).

Not a single thing above shows clearly that Java couldn't possibly have had the sort of multi-dimensional array you talk of; it would have been a sensible enough decision, but so also the decision that was made was also sensible.

这篇关于为什么不是Java有真正的多维数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆