为什么 Java 没有真正的多维数组? [英] Why doesn't Java have true multidimensional arrays?

查看:29
本文介绍了为什么 Java 没有真正的多维数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL;DR 版本,对于那些不想要背景的人,是以下具体问题:

The TL;DR version, for those who don't want the background, is the following specific question:

为什么 Java 没有真正的多维数组的实现?有可靠的技术原因吗?我在这里错过了什么?

Why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

背景

Java 在语法层面有多维数组,可以声明

Background

Java has multidimensional arrays at the syntax level, in that one can declare

int[][] arr = new int[10][10];

但这似乎真的不是人们所期望的.不是让 JVM 分配一个足够大的连续 RAM 块来存储 100 个 int ,而是以 int 的数组的形式出现:所以每一层都是一个连续的 RAM 块,但作为一个整体不是.访问 arr[i][j] 因此相当慢:JVM 必须

but it seems that this is really not what one might have expected. Rather than having the JVM allocate a contiguous block of RAM big enough to store 100 ints, it comes out as an array of arrays of ints: so each layer is a contiguous block of RAM, but the thing as a whole is not. Accessing arr[i][j] is thus rather slow: the JVM has to

  1. 找到存储在arr[i]int[];
  2. 索引它以查找存储在 arr[i][j] 中的 int.
  1. find the int[] stored at arr[i];
  2. index this to find the int stored at arr[i][j].

这涉及到查询一个对象从一层到下一层,这是相当昂贵的.

This involves querying an object to go from one layer to the next, which is rather expensive.

在一个层面上,不难看出为什么这不能优化为简单的缩放和添加查找,即使它全部分配在一个固定块中.问题在于 arr[3] 本身就是一个引用,并且可以更改.所以虽然数组的大小是固定的,但我们可以很容易地写

At one level, it's not hard to see why this can't be optimised to a simple scale-and-add lookup even if it were all allocated in one fixed block. The problem is that arr[3] is a reference all of its own, and it can be changed. So although arrays are of fixed size, we could easily write

arr[3] = new int[11];

现在缩放和添加被搞砸了,因为这一层已经增长.您需要在运行时知道所有内容是否仍与以前相同.此外,当然,这将被分配到 RAM 中的其他地方(它必须是,因为它比它要替换的要大),所以它甚至不在正确的位置进行缩放和添加.

and now the scale-and-add is screwed because this layer has grown. You'd need to know at runtime whether everything is still the same size as it used to be. In addition, of course, this will then get allocated somewhere else in RAM (it'll have to be, since it's bigger than what it's replacing), so it's not even in the right place for scale-and-add.

在我看来这并不理想,原因有二.

It seems to me that this is not ideal, and that for two reasons.

一方面,它.我使用这些方法对一维或多维数组的内容进行求和的测试花费了几乎两倍的时间(714 秒对 371 秒)对于多维情况(int[1000000] 和一个 int[100][100][100] 分别填充随机的 int 值,使用热缓存运行 1000000 次).

For one, it's slow. A test I ran with these methods for summing the contents of a single dimensional or multidimensional array took nearly twice as long (714 seconds vs 371 seconds) for the multidimensional case (an int[1000000] and an int[100][100][100] respectively, filled with random int values, run 1000000 times with warm cache).

public static long sumSingle(int[] arr) {
    long total = 0;
    for (int i=0; i<arr.length; i++)
        total+=arr[i];
    return total;
}

public static long sumMulti(int[][][] arr) {
    long total = 0;
    for (int i=0; i<arr.length; i++)
        for (int j=0; j<arr[0].length; j++)
            for (int k=0; k<arr[0][0].length; k++)
                total+=arr[i][j][k];
    return total;
}   

其次,因为它很慢,因此鼓励晦涩的编码.如果您遇到一些性能关键的事情,而这些事情可以用多维数组自然完成,您就有动力将其编写为平面数组,即使这会使它变得不自然且难以阅读.您面临着一个令人不快的选择:晦涩的代码或缓慢的代码.

Secondly, because it's slow, it thereby encourages obscure coding. If you encounter something performance-critical that would be naturally done with a multidimensional array, you have an incentive to write it as a flat array, even if that makes the unnatural and hard to read. You're left with an unpalatable choice: obscure code or slow code.

在我看来,基本问题很容易解决.正如我们之前看到的,无法优化的唯一原因是结构可能会发生变化.但是 Java 已经有一种使引用不可更改的机制:将它们声明为 final.

It seems to me that the basic problem could easily enough be fixed. The only reason, as we saw earlier, that it can't be optimised is that the structure might change. But Java already has a mechanism for making references unchangeable: declare them as final.

现在,只需声明它

final int[][] arr = new int[10][10];

还不够好,因为这里只有 arrfinal:arr[3] 仍然不是,并且可能是改变了,所以结构可能仍然会改变.但是,如果我们有一种声明方式使得它始终是 final,除了在存储 int 值的底层,那么我们将拥有一个完整的不可变的结构,并且可以全部分配为一个块,并通过缩放和添加进行索引.

isn't good enough because it's only arr that is final here: arr[3] still isn't, and could be changed, so the structure might still change. But if we had a way of declaring things so that it was final throughout, except at the bottom layer where the int values are stored, then we'd have an entire immutable structure, and it could all be allocated as one block, and indexed with scale-and-add.

它在语法上看起来如何,我不确定(我不是语言设计师).也许

How it would look syntactically, I'm not sure (I'm not a language designer). Maybe

final int[final][] arr = new int[10][10];

尽管不可否认,这看起来有点奇怪.这意味着: final 在顶层;final 在下一层;不是 final 在底层(否则 int 值本身将是不可变的).

although admittedly that looks a bit weird. This would mean: final at the top layer; final at the next layer; not final at the bottom layer (else the int values themselves would be immutable).

最终性将使 JIT 编译器能够优化这一点,从而将性能提供给一维数组的性能,然后消除以这种方式进行编码的诱惑,只是为了解决多维数组的缓慢问题.

Finality throughout would enable the JIT compiler to optimise this to give performance to that of a single dimensional array, which would then take away the temptation to code that way just to get round the slowness of multidimensional arrays.

(我听到谣言说 C# 做了这样的事情,虽然我也听到另一个谣言说 CLR 实现太糟糕了,不值得拥有......也许他们只是谣言......)

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

那么为什么 Java 没有真正的多维数组的实现呢?有可靠的技术原因吗?我在这里错过了什么?

So why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?

更新

一个奇怪的旁注:如果您使用 int 而不是 long 作为运行总数,时间上的差异会下降到只有几个百分点.为什么int会有这么小的区别,而long会有这么大的区别?

Update

A bizarre side note: the difference in timings drops away to only a few percent if you use an int for the running total rather than a long. Why would there be such a small difference with an int, and such a big difference with a long?

我用于基准测试的代码,以防有人想尝试重现这些结果:

Code I used for benchmarking, in case anyone wants to try to reproduce these results:

public class Multidimensional {

    public static long sumSingle(final int[] arr) {
        long total = 0;
        for (int i=0; i<arr.length; i++)
            total+=arr[i];
        return total;
    }

    public static long sumMulti(final int[][][] arr) {
        long total = 0;
        for (int i=0; i<arr.length; i++)
            for (int j=0; j<arr[0].length; j++)
                for (int k=0; k<arr[0][0].length; k++)
                    total+=arr[i][j][k];
        return total;
    }   

    public static void main(String[] args) {
        final int iterations = 1000000;

        Random r = new Random();
        int[] arr = new int[1000000];
        for (int i=0; i<arr.length; i++)
            arr[i]=r.nextInt();
        long total = 0;
        System.out.println(sumSingle(arr));
        long time = System.nanoTime();
        for (int i=0; i<iterations; i++)
            total = sumSingle(arr);
        time = System.nanoTime()-time;
        System.out.printf("Took %d ms for single dimension\n", time/1000000, total);

        int[][][] arrMulti = new int[100][100][100];
        for (int i=0; i<arrMulti.length; i++)
            for (int j=0; j<arrMulti[i].length; j++)
                for (int k=0; k<arrMulti[i][j].length; k++)
                    arrMulti[i][j][k]=r.nextInt();
        System.out.println(sumMulti(arrMulti));
        time = System.nanoTime();
        for (int i=0; i<iterations; i++)
            total = sumMulti(arrMulti);
        time = System.nanoTime()-time;
        System.out.printf("Took %d ms for multi dimension\n", time/1000000, total);
    }

}

推荐答案

但这似乎真的出乎人们的意料.

but it seems that this is really not what one might have expected.

为什么?

考虑到形式 T[] 表示T 类型的数组",那么正如我们期望 int[] 表示int 类型的数组"一样,我们希望 int[][] 表示int 类型的数组类型的数组",因为将 int[] 作为 Tint.

Consider that the form T[] means "array of type T", then just as we would expect int[] to mean "array of type int", we would expect int[][] to mean "array of type array of type int", because there's no less reason for having int[] as the T than int.

因此,考虑到可以拥有任何类型的数组,它遵循 [] 用于声明和初始化数组的方式(为此很重要,{, },),如果没有某种禁止数组数组的特殊规则,我们会得到这种用法for免费".

As such, considering that one can have arrays of any type, it follows just from the way [ and ] are used in declaring and initialising arrays (and for that matter, {, } and ,), that without some sort of special rule banning arrays of arrays, we get this sort of use "for free".

现在还要考虑一下我们可以用锯齿状数组做的事情,否则我们就做不到:

Now consider also that there are things we can do with jagged arrays that we can't do otherwise:

  1. 我们可以拥有锯齿状"数组,其中不同的内部数组具有不同的大小.
  2. 我们可以在外部数组中使用空数组,在适当的数据映射处,或者允许延迟构建.
  3. 我们可以故意在数组中使用别名,例如lookup[1]lookup[5] 是同一个数组.(这可以节省一些数据集的大量成本,例如,可以在少量内存中为 1,112,064 个代码点的完整集合映射许多 Unicode 属性,因为可以为具有匹配模式的范围重复属性的叶数组).
  4. 某些堆实现可以比内存中的一个大对象更好地处理许多较小的对象.
  1. We can have "jagged" arrays where different inner arrays are of different sizes.
  2. We can have null arrays within the outer array where appropriate mapping of the data, or perhaps to allow lazy building.
  3. We can deliberately alias within the array so e.g. lookup[1] is the same array as lookup[5]. (This can allow for massive savings with some data-sets, e.g. many Unicode properties can be mapped for the full set of 1,112,064 code points in a small amount of memory because leaf arrays of properties can be repeated for ranges with matching patterns).
  4. Some heap implementations can handle the many smaller objects better than one large object in memory.

在某些情况下,这些多维数组肯定很有用.

There are certainly cases where these sort of multi-dimensional arrays are useful.

现在,任何功能的默认状态都是未指定和未实现的.需要有人决定指定和实现一个功能,否则它就不会存在.

Now, the default state of any feature is unspecified and unimplemented. Someone needs to decide to specify and implement a feature, or else it wouldn't exist.

因为,如上所示,除非有人决定引入特殊的禁止数组数组功能,否则数组数组排序的多维数组将存在.由于上述原因,数组的数组很有用,因此做出这样的决定会很奇怪.

Since, as shown above, the array-of-array sort of multidimensional array will exist unless someone decided to introduce a special banning array-of-array feature. Since arrays of arrays are useful for the reasons above, that would be a strange decision to make.

相反,多维数组的排序,其中数组的已定义秩可以大于 1,因此可以与一组索引而不是单个索引一起使用,这种多维数组并不自然地遵循已定义的内容.有人需要:

Conversely, the sort of multidimensional array where an array has a defined rank that can be greater than 1 and so be used with a set of indices rather than a single index, does not follow naturally from what is already defined. Someone would need to:

  1. 确定声明、初始化和使用的规范.
  2. 记录下来.
  3. 编写实际代码来执行此操作.
  4. 测试代码以执行此操作.
  5. 处理错误、边缘情况、报告实际上不是错误的错误、修复错误导致的向后兼容性问题.

用户也必须学习这个新功能.

Also users would have to learn this new feature.

所以,它必须是值得的.一些值得的事情是:

So, it has to be worth it. Some things that would make it worth it would be:

  1. 如果没有办法做同样的事情.
  2. 如果做同样事情的方式很奇怪或不为人所知.
  3. 人们会从类似的环境中期待它.
  4. 用户自己无法提供类似的功能.

在这种情况下:

  1. 但是有.
  2. 在数组中使用 strides 已经为 C 和 C++ 程序员所熟知,而 Java 基于其语法构建,因此相同的技术可以直接应用
  3. Java 的语法基于 C++,而 C++ 类似地仅直接支持多维数组作为数组的数组.(除了静态分配时,但在 Java 中,数组是对象的情况并非如此).
  4. 可以轻松编写一个类,该类包装一个数组和步幅大小的详细信息,并允许通过一组索引进行访问.

真的,问题不是为什么 Java 没有真正的多维数组"?但是为什么要这样做?"

Really, the question is not "why doesn't Java have true multidimensional arrays"? But "Why should it?"

当然,您支持多维数组的观点是有效的,出于这个原因,某些语言确实有这些观点,但负担仍然是争论一个特性,而不是争论它.

Of course, the points you made in favour of multidimensional arrays are valid, and some languages do have them for that reason, but the burden is nonetheless to argue a feature in, not argue it out.

(我听到谣言说 C# 做了这样的事情,虽然我也听到另一个谣言说 CLR 实现太糟糕了,不值得拥有......也许他们只是谣言......)

(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)

像许多谣言一样,这里有一些真相,但不是全部真相.

Like many rumours, there's an element of truth here, but it is not the full truth.

.NET 数组确实可以有多个等级.这并不是它比 Java 更灵活的唯一方式.每个等级也可以有一个除零以外的下限.因此,例如,您可以拥有一个从 -3 到 42 的数组或一个二维数组,其中一个等级从 -2 到 5,另一个从 57 到 100,或其他.

.NET arrays can indeed have multiple ranks. This is not the only way in which it is more flexible than Java. Each rank can also have a lower-bound other than zero. As such, you could for example have an array that goes from -3 to 42 or a two dimensional array where one rank goes from -2 to 5 and another from 57 to 100, or whatever.

C# 没有从其内置语法中完全访问所有这些(您需要调用 Array.CreateInstance() 以获得除零以外的下界),但它允许您将语法 int[,] 用于 int 的二维数组,将 int[,,] 用于三维数组,等等.

C# does not give complete access to all of this from its built-in syntax (you need to call Array.CreateInstance() for lower bounds other than zero), but it does for allow you to use the syntax int[,] for a two-dimensional array of int, int[,,] for a three-dimensional array, and so on.

现在,处理除零以外的下界所涉及的额外工作增加了性能负担,但这些情况相对不常见.出于这个原因,下限为 0 的单秩数组被视为具有更高性能实现的特殊情况.事实上,它们在内部是一种不同的结构.

Now, the extra work involved in dealing with lower bounds other than zero adds a performance burden, and yet these cases are relatively uncommon. For that reason single-rank arrays with a lower-bound of 0 are treated as a special case with a more performant implementation. Indeed, they are internally a different sort of structure.

在 .NET 中,下界为零的多维数组被视为下界恰好为零的多维数组(即,作为较慢情况的示例),而不是较快的情况能够处理大于 1 的等级.

In .NET multi-dimensional arrays with lower bounds of zero are treated as multi-dimensional arrays whose lower bounds just happen to be zero (that is, as an example of the slower case) rather than the faster case being able to handle ranks greater than 1.

当然,.NET本可以有一个基于零的多维数组的快速路径案例,但是Java没有它们的所有原因都适用 事实上已经有一个特殊情况,特殊情况很糟糕,然后会有两个特殊情况,他们会更糟糕.(实际上,尝试将一种类型的值分配给另一种类型的变量时可能会遇到一些问题).

Of course, .NET could have had a fast-path case for zero-based multi-dimensional arrays, but then all the reasons for Java not having them apply and the fact that there's already one special case, and special cases suck, and then there would be two special cases and they would suck more. (As it is, one can have some issues with trying to assign a value of one type to a variable of the other type).

上面没有任何一件事清楚地表明 Java 不可能有你所说的那种多维数组;这本来是一个足够明智的决定,但做出的决定也是明智的.

Not a single thing above shows clearly that Java couldn't possibly have had the sort of multi-dimensional array you talk of; it would have been a sensible enough decision, but so also the decision that was made was also sensible.

这篇关于为什么 Java 没有真正的多维数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆