写入& quot;压缩&"。阵列可以提高IO性能? [英] Write "compressed" Array to increase IO performance?

查看:77
本文介绍了写入& quot;压缩&"。阵列可以提高IO性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个int和float数组,每个数组的长度为2.2亿(固定)。现在,我想将这些阵列存储到内存和磁盘/从内存和磁盘上载。目前,我正在使用Java NIO的FileChannel和MappedByteBuffer解决此问题。它可以正常工作,但是大约需要5秒钟(墙上时钟时间)才能将阵列存储到存储器或从存储器上载到磁盘。现在,我想使其更快。

I have an int and float array each of length 220 million (fixed). Now, I want to store/upload those arrays to/from memory and disk. Currently, I am using Java NIO's FileChannel and MappedByteBuffer to solve this. It works fine, but it takes near about 5 seconds (Wall Clock Time) for storing/uploading array to/from memory to disk. Now, I want to make it faster.

在这里,我要提到的大多数数组元素都是0(将近52%)。

Here, I should mention most of those array elements are 0 ( nearly 52 %).

像这样:

int arr1 [] = { 0 , 0 , 6 , 7 , 1, 0 , 0 ...}

有人可以帮助我吗,有什么好方法可以通过不存储或加载这些0来提高速度。可以使用Arrays.fill(array,0)进行补偿。

Can anybody help me, is there any nice way to improve speed by not storing or loading those 0's. This can compensated by using Arrays.fill (array , 0).

推荐答案

以下方法需要n / 8 + nz * 4磁盘上的字节数,其中n是数组的大小,nz是非零条目的数目。对于52%的零条目,您可以将存储空间减少52%-3%= 49%。

The following approach requires n / 8 + nz * 4 bytes on disk, where n is the size of the array, and nz the number of non-zero entries. For 52% zero entries, you'd reduce storage size by 52% - 3% = 49%.

您可以这样做:

void write(int[] array) {
    BitSet zeroes = new BitSet();
    for (int i = 0; i < array.length; i++)
        zeroes.set(i, array[i] == 0);
    write(zeroes); // one bit per index
    for (int i = 0; i < array.length; i++)
        if (array[i] != 0)
            write(array[y]);
}

int[] read() {
    BitSet zeroes = readBitSet();
    array = new int[zeroes.length];
    for (int i = 0; i < zeroes.length; i++) {
        if (zeroes.get(i)) {
            // nothing to do (array[i] was initialized to 0)
        } else {
            array[i] = readInt();
        }
    }
}

编辑:您这样说稍慢意味着磁盘不是瓶颈。您可以通过在构建位集时将其写入来调整上述方法,因此不必在将位集写入磁盘之前将其写入内存。另外,通过逐个单词地将位集写入实际数据,我们只能对数组进行一次遍历,从而减少高速缓存未命中:

That you say this is slightly slower implies that the disk is not the bottleneck. You could tune the above approach by writing the bitset as you construct it, so you don't have to write the bitset to memory before writing it to disk. Also, by writing the bitset word by word interspersed with the actual data we can do only a single pass over the array, reducing cache misses:

void write(int[] array) {
    writeInt(array.length);
    int ni;
    for (int i = 0; i < array.length; i = ni) {
        ni = i + 32;
        int zeroesMap = 0;
        for (j = i + 31; j >= i; j--) {
            zeroesMap <<= 1;
            if (array[j] == 0) {
                zeroesMap |= 1;
            }
        }
        writeInt(zeroesMap);
        for (j = i; j < ni; j++)
            if (array[j] != 0) {
                writeInt(array[j]);
            }
        }
    }
}

int[] read() {
    int[] array = new int[readInt()];
    int ni;
    for (int i = 0; i < array.length; i = ni) {
        ni = i + 32;
        zeroesMap = readInt();
        for (j = i; j < ni; j++) {
            if (zeroesMap & 1 == 1) {
                // nothing to do (array[i] was initialized to 0)
            } else {
                array[j] = readInt();
            }
            zeroesMap >>= 1;
        }
    }
    return array;
}

(前面的代码假定array.length是32的倍数。如果不是,以您喜欢的任何方式写出数组的最后一部分)

(The preceeding code assumes array.length is a multiple of 32. If not, write the last slice of the array in whatever way you like)

如果这也不能减少处理时间,那么压缩也不是可行的方法(我不愿意)认为任何通用的压缩算法都不会比上述算法更快。

If that doesn't reduce proceccing time either, compression is not the way to go (I don't think any general purpose compression algorithm will be faster than the above).

这篇关于写入&amp; quot;压缩&amp;&quot;。阵列可以提高IO性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆