写" COM pressed"阵列来提高IO性能? [英] Write "compressed" Array to increase IO performance?

查看:129
本文介绍了写" COM pressed"阵列来提高IO性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个int和float每个长度2.2亿(固定)的阵列。现在,我想存储/从内存和磁盘上传这些阵列/。目前,我使用Java NIO的FileChannel和MappedByteBuffer来解决这个问题。它工作正常,但它需要近5秒(挂钟时间),用于存储/上传数组/从内存到磁盘。现在,我想,使其更快。

I have an int and float array each of length 220 million (fixed). Now, I want to store/upload those arrays to/from memory and disk. Currently, I am using Java NIO's FileChannel and MappedByteBuffer to solve this. It works fine, but it takes near about 5 seconds (Wall Clock Time) for storing/uploading array to/from memory to disk. Now, I want to make it faster.

在这里,我要提到的大部分数组元素都为0(近52%)。

Here, I should mention most of those array elements are 0 ( nearly 52 %).

这样的:

int arr1 [] = { 0 , 0 , 6 , 7 , 1, 0 , 0 ...}

任何人可以帮助我,有没有通过不储存或加载那些0,以提高速度的很好的方式。这可以通过使用Arrays.fill(数组,0)的补偿。

Can anybody help me, is there any nice way to improve speed by not storing or loading those 0's. This can compensated by using Arrays.fill (array , 0).

推荐答案

以下方法需要n / 8 + NZ * 4个字节在磁盘上,其中n是阵列的大小,和NZ非零项的数量。对于52%的零项,你会由52%降低存储容量 - 3%= 49%

The following approach requires n / 8 + nz * 4 bytes on disk, where n is the size of the array, and nz the number of non-zero entries. For 52% zero entries, you'd reduce storage size by 52% - 3% = 49%.

您可以这样做:

void write(int[] array) {
    BitSet zeroes = new BitSet();
    for (int i = 0; i < array.length; i++)
        zeroes.set(i, array[i] == 0);
    write(zeroes); // one bit per index
    for (int i = 0; i < array.length; i++)
        if (array[i] != 0)
            write(array[y]);
}

int[] read() {
    BitSet zeroes = readBitSet();
    array = new int[zeroes.length];
    for (int i = 0; i < zeroes.length; i++) {
        if (zeroes.get(i)) {
            // nothing to do (array[i] was initialized to 0)
        } else {
            array[i] = readInt();
        }
    }
}

编辑:那你说这是稍微慢意味着磁盘不是瓶颈。你可以通过编写此位集为您构建它,所以你不用它写入磁盘之前写的bitset内存调整上述方法。同时,通过写经字与我们能做的只有数组通过单次实际数据穿插位集字,降低了高速缓存未命中:

That you say this is slightly slower implies that the disk is not the bottleneck. You could tune the above approach by writing the bitset as you construct it, so you don't have to write the bitset to memory before writing it to disk. Also, by writing the bitset word by word interspersed with the actual data we can do only a single pass over the array, reducing cache misses:

void write(int[] array) {
    writeInt(array.length);
    int ni;
    for (int i = 0; i < array.length; i = ni) {
        ni = i + 32;
        int zeroesMap = 0;
        for (j = i + 31; j >= i; j--) {
            zeroesMap <<= 1;
            if (array[j] == 0) {
                zeroesMap |= 1;
            }
        }
        writeInt(zeroesMap);
        for (j = i; j < ni; j++)
            if (array[j] != 0) {
                writeInt(array[j]);
            }
        }
    }
}

int[] read() {
    int[] array = new int[readInt()];
    int ni;
    for (int i = 0; i < array.length; i = ni) {
        ni = i + 32;
        zeroesMap = readInt();
        for (j = i; j < ni; j++) {
            if (zeroesMap & 1 == 1) {
                // nothing to do (array[i] was initialized to 0)
            } else {
                array[j] = readInt();
            }
            zeroesMap >>= 1;
        }
    }
    return array;
}

(将preceeding code假定array.length是32。如果不是多,写你喜欢的任何方式数组的最后一个切片)

(The preceeding code assumes array.length is a multiple of 32. If not, write the last slice of the array in whatever way you like)

如果不减少任何proceccing时间,玉米pression不是去(我不认为任何通用COM pression算法会比上面的快)的方式。

If that doesn't reduce proceccing time either, compression is not the way to go (I don't think any general purpose compression algorithm will be faster than the above).

这篇关于写&QUOT; COM pressed&QUOT;阵列来提高IO性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆