二进制搜索和哈希表搜索 [英] Binary Search and Hashtable Search

查看:71
本文介绍了二进制搜索和哈希表搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找出在字典查找和数组的二进制搜索查找之间的折衷点.我期望对Dictionary进行恒定时间查询,对二进制搜索进行对数时间查询,具体取决于集合的大小,而对于较小的集合,二进制搜索的性能更好.

I wanted to find out the trade off point between a Dictionary lookup and a binary search lookup of an array. I was expecting constant time lookups for the Dictionary, and logarithmic time lookups for the binary search depending on the size of the collection, with the binary search performing better for smaller sized collections.

但是,当我看到以下结果时,我感到很惊讶:

However, I was surprised when I saw the following results:

让我感到惊讶的是:1.二分搜索首先是对数增长,然后增长得更快. 2.起初哈希值非常稳定,但随后又开始缓慢增长. 3.二进制搜索永远不会比哈希查找更好.下面是我的代码.我做错了什么?

I was surprised at: 1. Binary search is growing logarithmically at first, and then grows much faster. 2. Hash is pretty consistent at first, but then also starts growing slowly. 3. Binary search is never better than the hash lookup. Below is my code. What did I do wrong?

class Program
{
    static void Main(string[] args)
    {
        var r = new Random();
        var targets = Enumerable.Range(0, 1000 * 1000).Select(_ => r.Next(int.MaxValue)).ToList();

        for (int totalCount = 1; totalCount < 1000*1000*10; totalCount*=2)
        {
            var a = Enumerable.Range(0, totalCount).Select(_ => r.Next(int.MaxValue)).Distinct().Select(v => new thing(v)).OrderBy(t => t.value).ToArray();
            var d = a.ToDictionary(t => t.value);

            var watch = new System.Diagnostics.Stopwatch();

            {
                watch.Start();
                var found = targets.Select(t => BinarySearch(t, a)).Where(t => t != null).Count();
                watch.Stop();
                Console.WriteLine(string.Format("found {0} things out of {2} in {1} ms with binary search", found, watch.ElapsedMilliseconds, a.Length));
            }
            {
                watch.Restart();
                var found =  targets.Select(t => HashSearch(t, d)).Where(t => t != null).Count();
                watch.Stop();
                Console.WriteLine(string.Format("found {0} things out of {2} in {1} ms with hash search", found, watch.ElapsedMilliseconds, d.Keys.Count));
            }
        }
        Console.ReadLine();
    }

    static thing HashSearch(int needle, Dictionary<int, thing> hash)
    {
        if (!hash.ContainsKey(needle))
            return null;
        return hash[needle];
    }

    static thing BinarySearch(int needle, thing[] sortedHaystack)
    {
        return BinarySearch(needle, sortedHaystack, 0, sortedHaystack.Length - 1);
    }
    static thing BinarySearch(int needle, thing[] sortedHaystack, int minimum, int maximum)
    {
        if (minimum > maximum)
            return null;
        var middle = (minimum + maximum) / 2;
        if (needle == sortedHaystack[middle].value)
            return sortedHaystack[middle];
        if (needle < sortedHaystack[middle].value)
            return BinarySearch(needle, sortedHaystack, minimum, middle - 1);
        return BinarySearch(needle, sortedHaystack, middle + 1, maximum);
    }

    class thing
    {
        public int value;
        public thing(int v)
        {
            value = v;
        }
    }
}

推荐答案

(与注释中提到的差不多.)

(Pretty much as noted in comments.)

我怀疑您主要是在看到高速缓存未命中的影响.当集合很大时,您会遇到很多缓存未命中的情况-尤其是对于二进制搜索而言,二进制搜索可能需要触摸集合中的很多点才能找到元素.

I suspect you're mostly seeing the effects of cache misses. When the collection is large, you'll get a lot of cache misses - particularly with the binary search, which potentially needs to touch a lot of points in the collection to find an element.

在小尺寸的情况下,我怀疑您也看到高速缓存未命中,但是这次在您的targets列表中-以及LINQ本身的开销. LINQ很快,但是当您要做的只是对中间的一个很小的集合进行一次搜索时,它仍然很重要.

At the low sizes, I suspect you're seeing cache misses too, but this time on your targets list - and also the overhead of LINQ itself. LINQ is fast, but it can still be significant when all you're doing is performing a single search of a tiny collection in the middle.

我建议将循环重写为:

{
    // Use the same seed each time for consistency. Doesn't have to be 0.
    Random random = new Random(0);
    watch.Start();
    int found = 0;
    for (int i = 0; i < 1000 * 1000; i++)
    {
        if (BinarySearch(t, random.Next(int.MaxValue)) != null)
        {
            found++;
        }
    }
    watch.Stop();
    Console.WriteLine(string.Format
         "found {0} things out of {2} in {1} ms with binary search",
         found, watch.ElapsedMilliseconds, a.Length));
}

当然,您会遇到在循环中包括随机数生成的问题……您可能想看看使用一种比System.Random更快的随机数生成器(如果可以找到一个).或使用其他方法确定要查找的元素.

Of course you've then got the problem of including random number generation in the loop instead... you might want to look at using a random number generator which is faster than System.Random if you can find one. Or use some other way of determining which elements to look up.

哦,我个人将二进制搜索重写为使用迭代而不是递归,但这是另一回事.我希望它不会产生重大影响.

Oh, and I'd personally rewrite the binary search to use iteration rather than recursion, but that's a different matter. I wouldn't expect it to have a significant effect.

这篇关于二进制搜索和哈希表搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆