可以数组访问进行优化? [英] can array access be optimized?

查看:98
本文介绍了可以数组访问进行优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

也许我被我的探查器(Netbeans的)误导了,但我看到一些奇怪的行为,希望也许有人在这里可以帮助我理解这一点。

我上的应用程序,这使得大量使用相当大的哈希表的工作(键是多头,值对象)。与内置在Java哈希表的性能(特别是HashMap的)是非常差,并尝试一些替代品后 - 特罗韦,Fastutils,柯尔特,胡萝卜 - 我开始在我自己的工作。

在code。使用双散列的策略是非常基本的。这工作正常,良好,并显示迄今我已经尽力了。在所有其他选项的最佳性能。

美中不足的是,根据剖析,查找到哈希表是在整个应用程序中最昂贵的方法 - 尽管其它方法被调用的许多的次数多了,/或者做的很多的更多的逻辑。

但真正让我困惑的是查找仅由一个类调用;调用方法执行查询和处理结果。两者都称为几乎相同的次数,并调用查找的方法有很多在它的逻辑来处理所述查询的结果,但是大约快100倍。

下面是code的哈希查找。它基本上只是两次访问到一个数组(即计算哈希值codeS的功能,根据分析,几乎是免费的)。我不知道如何code这一点可以这么慢,因为它仅仅是数组访问,我没有看到使其更快的任何方式。

注意,code简单地返回水桶匹配键,调用者预计处理桶。 '尺寸'是hash.length / 2,HASH1确实在哈希表的前半部分查找,HASH2确实查找在下半年。 key_index是传递到构造哈希表中的最终诠释场,并在入境的值对象数组是多头通常长度为10或更少。的一小阵

有什么想法的人对此得多AP preciated。

感谢。

 公共最后一项得到(最终长theKey){
    进入aEntry =散[HASH1(theKey,大小);    如果(aEntry = NULL&放大器;!&安培;!aEntry.values​​ [key_index] = theKey){
        aEntry =散列[HASH2(theKey,尺寸)];        如果(aEntry = NULL&放大器;!&安培;!aEntry.values​​ [key_index] = theKey){
            返回null;
        }
    }    返回aEntry;
}

编辑中,code为HASH1&安培; HASH2

 私有静态诠释HASH1(最终长密钥,最终诠释hashTableSize){
    回报(INT)(钥匙及(hashTableSize-1));
}
私有静态诠释HASH2(最终长密钥,最终诠释hashTableSize){
    回报(INT)(hashTableSize +((^键(按键>> 3))及(hashTableSize-1)));
}


解决方案

在没有你的实施的令我特别低效。我承认我真的不按照你的散列/查找的策略的,但如果你说这是你的情况高性能,我就相信你。

这是我所期待的唯一的事情可能使的部分的区别是将重点从输入的数值数组。

而不是有这样的:

 类条目{
    长[]值;
}// ...
如果(entry.values​​ [key_index] ==键){// ...

试试这个:

 类条目{
    长键;
    长值[];
}// ...
如果(entry.key ==键){// ...

相反招致访问构件,再加上操作的边界检查,然后让所述阵列的值的成本,就应该只收取访问构件的成本。

是否有一个随机访问的数据类型不是一个数组快吗?

我感兴趣的是这个问题的答案,所以我成立了一个测试环境。这是我的阵列接口:

 接口阵{
    长的get(int i)以;
    空集(INT I,长五);
}

这阵列是未定义的行为,当指数是出界。我扔在一起明显实现:

 类NormalArray实现阵{
    私人长期[]的数据;    公共NormalArray(INT大小){
        数据=新长【尺寸】;
    }    @覆盖
    众长的get(int i)以{
        返回数据[I]
    }    @覆盖
    公共无效集(INT I,长五){
        数据[I] = V;
    }
}

和则控制:

 类NoOpArray实现阵{
    @覆盖
    众长的get(int i)以{
        返回0;
    }
    @覆盖
    公共无效集(INT I,长五){
    }
}

最后,我设计了一个阵列,其中第10指数是很难$ C $裁谈会成员国。成员被设置/通过一个开关选择:

 类TenArray实现阵{
    私人长期V0;
    私人长期V1;
    私人长期V2;
    私人长期V3;
    私人长期V4;
    私人长期V5;
    私人长期V6;
    私人长期V7;
    私人长期V8;
    私人长期V9;
    私人长期[]演员;    公共TenArray(INT大小){
        如果(大小大于10){
            演员=新长[大小 - 10];
        }
    }    @覆盖
    众长GET(最终INT I){
        开关(ⅰ){
        情况下0:
            返回V0;
        情况1:
            返回V1;
        案例2:
            返回V2;
        案例3:
            返回V3;
        情况4:
            返回V4;
        情况5:
            返回V5;
        情况6:
            返回V6;
        案例7:
            返回V7;
        案例8:
            返回V8;
        案例9:
            返回V9;
        默认:
            返回演员[我 - 10];
        }
    }    @覆盖
    公共无效集(最终诠释我,最终的长V){
        开关(ⅰ){
        情况下0:
            V0 = V;打破;
        情况1:
            V1 = V;打破;
        案例2:
            V2 = V;打破;
        案例3:
            V3 = V;打破;
        情况4:
            V4 = V;打破;
        情况5:
            V5 = V;打破;
        情况6:
            V6 = V;打破;
        案例7:
            V7 = V;打破;
        案例8:
            V8 = V;打破;
        案例9:
            V9 = V;打破;
        默认:
            演员[我 - 10] = V;
        }
    }
}

我这个线束测试它:

 导入了java.util.Random;公共类ArrayOptimization {
    公共静态无效的主要(字串[] args){
        INT大小= 10;
        长[]数据=新长【尺寸】;
        随机R =新的随机();
        的for(int i = 0; I< data.length;我++){
            数据[I] = r.nextLong();
        }        []数组A =新的Array [] {
                新NoOpArray(),
                新NormalArray(大小),
                新TenArray(大小)
        };        为(;;){
            的for(int i = 0; I<则为a.length;我++){
                testSet(一个由[i],数据,10000000);
                testGet(一个由[i],数据,10000000);
            }
        }
    }    私有静态无效testGet(阵列A,长[]的数据,诠释迭代)​​{
            长毫微秒= System.nanoTime();
        的for(int i = 0; I<迭代;我++){
            对于(INT J = 0; J< data.length; J ++){
                数据[J] = a.get(J);
            }
        }
        多头止损= System.nanoTime();
        System.out.printf(%S /获取了FMS%%N,a.getClass()的getName()
                (停止 - 毫微秒)/ 1000000.0);
    }    私有静态无效testSet(阵列A,长[]的数据,诠释迭代)​​{
        长毫微秒= System.nanoTime();
        的for(int i = 0; I<迭代;我++){
            对于(INT J = 0; J< data.length; J ++){
                a.set(J,数据[J]);
            }
        }
        多头止损= System.nanoTime();
        System.out.printf(%S /套拿了%FMS%N,a.getClass()的getName()
                (停止 - 毫微秒)/ 1000000.0);    }
}

的结果有些出人意料。所述TenArray进行非平凡的速度比一NormalArray不同(尺寸与所述; = 10)。减去开销(使用NoOpArray平均值)你TenArray服用〜正常排列的65%的时间。所以,如果你知道你的阵列的可能最大大小,我想它的的可能超过数组的速度。我可以想象交换机使用或者更少的边界检查或更有效的边界检查不超过一个数组。

  NoOpArray /套拿了953.272654ms
NoOpArray /获取了891.514622ms
NormalArray /套拿了1235.694953ms
NormalArray /获取了1148.091061ms
TenArray /套拿了1149.833109ms
TenArray /获取了1054.040459ms
NoOpArray /套拿了948.458667ms
NoOpArray /获取了888.618223ms
NormalArray /套拿了1232.554749ms
NormalArray /获取了1120.333771ms
TenArray /套拿了1153.505578ms
TenArray /获取了1056.665337ms
NoOpArray /套拿了955.812843ms
NoOpArray /获取了893.398847ms
NormalArray /套拿了1237.358472ms
NormalArray /获取了1125.100537ms
TenArray /套拿了1150.901231ms
TenArray /获取了1057.867936ms

现在你是否能在实践中得到的速度不是一个数组快我不能肯定;显然这种方式,你承担的接口/类/方法相关的任何开销。

Maybe I'm being misled by my profiler (Netbeans), but I'm seeing some odd behavior, hoping maybe someone here can help me understand it.

I am working on an application, which makes heavy use of rather large hash tables (keys are longs, values are objects). The performance with the built in java hash table (HashMap specifically) was very poor, and after trying some alternatives -- Trove, Fastutils, Colt, Carrot -- I started working on my own.

The code is very basic using a double hashing strategy. This works fine and good and shows the best performance of all the other options I've tried thus far.

The catch is, according to the profiler, lookups into the hash table are the single most expensive method in the entire application -- despite the fact that other methods are called many more times, and/or do a lot more logic.

What really confuses me is the lookups are called only by one class; the calling method does the lookup and processes the results. Both are called nearly the same number of times, and the method that calls the lookup has a lot of logic in it to handle the result of the lookup, but is about 100x faster.

Below is the code for the hash lookup. It's basically just two accesses into an array (the functions that compute the hash codes, according to profiling, are virtually free). I don't understand how this bit of code can be so slow since it is just array access, and I don't see any way of making it faster.

Note that the code simply returns the bucket matching the key, the caller is expected to process the bucket. 'size' is the hash.length/2, hash1 does lookups in the first half of the hash table, hash2 does lookups in the second half. key_index is a final int field on the hash table passed into the constructor, and the values array on the Entry objects is a small array of longs usually of length 10 or less.

Any thoughts people have on this are much appreciated.

Thanks.

public final Entry get(final long theKey) {
    Entry aEntry = hash[hash1(theKey, size)];

    if (aEntry != null && aEntry.values[key_index] != theKey) {
        aEntry = hash[hash2(theKey, size)];

        if (aEntry != null && aEntry.values[key_index] != theKey) {
            return null;
        }
    }

    return aEntry;
}

Edit, the code for hash1 & hash2

private static int hash1(final long key, final int hashTableSize) { 
    return (int)(key&(hashTableSize-1)); 
}
private static int hash2(final long key, final int hashTableSize) { 
    return (int)(hashTableSize+((key^(key>>3))&(hashTableSize-1))); 
}

解决方案

Nothing in your implementation strikes me as particularly inefficient. I'll admit I don't really follow your hashing/lookup strategy, but if you say it's performant in your circumstances, I'll believe you.

The only thing that I would expect might make some difference is to move the key out of the values array of Entry.

Instead of having this:

class Entry {
    long[] values;
}

//...
if ( entry.values[key_index] == key ) { //...

Try this:

class Entry {
    long key;
    long values[];
}

//...
if ( entry.key == key ) { //...

Instead of incurring the cost of accessing a member, plus doing bounds checking, then getting a value of the array, you should just incur the cost of accessing the member.

Is there a random-access data type faster than an array?

I was interested in the answer to this question, so I set up a test environment. This is my Array interface:

interface Array {
    long get(int i);
    void set(int i, long v);
}

This "Array" has undefined behaviour when indices are out of bounds. I threw together the obvious implementation:

class NormalArray implements Array {
    private long[] data;

    public NormalArray(int size) {
        data = new long[size];
    }

    @Override
    public long get(int i) {
        return data[i];
    }

    @Override
    public void set(int i, long v) {
        data[i] = v;
    }
}

And then a control:

class NoOpArray implements Array {
    @Override
    public long get(int i) {
        return 0;
    }
    @Override
    public void set(int i, long v) {
    }
}

Finally, I designed an "array" where the first 10 indices are hardcoded members. The members are set/selected through a switch:

class TenArray implements Array {
    private long v0;
    private long v1;
    private long v2;
    private long v3;
    private long v4;
    private long v5;
    private long v6;
    private long v7;
    private long v8;
    private long v9;
    private long[] extras;

    public TenArray(int size) {
        if (size > 10) {
            extras = new long[size - 10];
        }
    }

    @Override
    public long get(final int i) {
        switch (i) {
        case 0:
            return v0;
        case 1:
            return v1;
        case 2:
            return v2;
        case 3:
            return v3;
        case 4:
            return v4;
        case 5:
            return v5;
        case 6:
            return v6;
        case 7:
            return v7;
        case 8:
            return v8;
        case 9:
            return v9;
        default:
            return extras[i - 10];
        }
    }

    @Override
    public void set(final int i, final long v) {
        switch (i) {
        case 0:
            v0 = v; break;
        case 1:
            v1 = v; break;
        case 2:
            v2 = v; break;
        case 3:
            v3 = v; break;
        case 4:
            v4 = v; break;
        case 5:
            v5 = v; break;
        case 6:
            v6 = v; break;
        case 7:
            v7 = v; break;
        case 8:
            v8 = v; break;
        case 9:
            v9 = v; break;
        default:
            extras[i - 10] = v;
        }
    }
}

I tested it with this harness:

import java.util.Random;

public class ArrayOptimization {
    public static void main(String[] args) {
        int size = 10;
        long[] data = new long[size];
        Random r = new Random();
        for ( int i = 0; i < data.length; i++ ) {
            data[i] = r.nextLong();
        }

        Array[] a = new Array[] {
                new NoOpArray(),
                new NormalArray(size),
                new TenArray(size)
        };

        for (;;) {
            for ( int i = 0; i < a.length; i++ ) {
                testSet(a[i], data, 10000000);
                testGet(a[i], data, 10000000);
            }
        }
    }

    private static void testGet(Array a, long[] data, int iterations) {
            long nanos = System.nanoTime();
        for ( int i = 0; i < iterations; i++ ) {
            for ( int j = 0; j < data.length; j++ ) {
                data[j] = a.get(j);
            }
        }
        long stop = System.nanoTime();
        System.out.printf("%s/get took %fms%n", a.getClass().getName(), 
                (stop - nanos) / 1000000.0);
    }

    private static void testSet(Array a, long[] data, int iterations) {
        long nanos = System.nanoTime();
        for ( int i = 0; i < iterations; i++ ) {
            for ( int j = 0; j < data.length; j++ ) {
                a.set(j, data[j]);
            }
        }
        long stop = System.nanoTime();
        System.out.printf("%s/set took %fms%n", a.getClass().getName(), 
                (stop - nanos) / 1000000.0);

    }
}

The results were somewhat surprising. The TenArray performs non-trivially faster than a NormalArray does (for sizes <= 10). Subtracting the overhead (using the NoOpArray average) you get TenArray as taking ~65% of the time of the normal array. So if you know the likely max size of your array, I suppose it is possible to exceed the speed of an array. I would imagine switch uses either less bounds checking or more efficient bounds checking than does an array.

NoOpArray/set took 953.272654ms
NoOpArray/get took 891.514622ms
NormalArray/set took 1235.694953ms
NormalArray/get took 1148.091061ms
TenArray/set took 1149.833109ms
TenArray/get took 1054.040459ms
NoOpArray/set took 948.458667ms
NoOpArray/get took 888.618223ms
NormalArray/set took 1232.554749ms
NormalArray/get took 1120.333771ms
TenArray/set took 1153.505578ms
TenArray/get took 1056.665337ms
NoOpArray/set took 955.812843ms
NoOpArray/get took 893.398847ms
NormalArray/set took 1237.358472ms
NormalArray/get took 1125.100537ms
TenArray/set took 1150.901231ms
TenArray/get took 1057.867936ms

Now whether you can in practice get speeds faster than an array I'm not sure; obviously this way you incur any overhead associated with the interface/class/methods.

这篇关于可以数组访问进行优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆