排序数值的有效搜索 [英] Efficient search of sorted numerical values

查看:223
本文介绍了排序数值的有效搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 INT [] 数组包含的值具有以下属性:

I have an int[] array that contains values with the following properties:

  • 在他们的排序
  • 在他们唯一(不重复)
  • 他们是在已知范围 [0..MAX)
  • MAX是典型地比阵列的长度(例如10-100倍)
  • 大相当多
  • 有时数字是均匀分布的范围内,但在其他时间有连续编号的相当长的序列。我估计它是关于50/50两种情况之间。
  • They are sorted
  • They are unique (no duplicates)
  • They are in a known range [0..MAX)
  • MAX is typically quite a lot larger than the length of the array (say 10-100x)
  • Sometimes the numbers are evenly distributed across the range, but at other times there are quite long sequences of consecutive numbers. I estimate it is about 50/50 between the two cases.

鉴于此列表,我想有效地找到该阵列中的特定值的索引(或如果该值不是present,寻找下一个更高的值)。

Given this list, I want to efficiently find the index of a specific value in the array (or if the value is not present, find the next higher value).

我的已经实现了一个标准二进制搜索与间隔二分的作品还算不错,但我有一个怀疑,数据的性质/分布可以被利用来收敛到一个解决方案快。

I've already implemented a straight binary search with interval bisection that works fairly well, but I have a suspicion that the nature/distribution of the data can be exploited to converge to a solution faster.

我感兴趣的优化平均情况下搜索时间,但重要的是,在最坏的情况是绝不会比O(log n)的差作为数组有时非常大的。

I'm interested in optimising the average case search time, but it is important that the worst case is never worse than O(log n) as the arrays are sometimes very large.

提问:有可能做的比在平均情况下纯二进制搜索好得多

Question: it is possible to do much better than a plain binary search in the average case?

修改(澄清的其他问题/评论)

EDIT (to clarify additional questions / comments)

  • 在O本恒(log n)的绝对重要的。实际上假定为O更好的算法复杂性(日志n)是不可能的,不变的是大概的只有的事情,重要的.....
  • 这往往是一次性的搜索,所以当preprocessing可能它可能不会是值得的。
  • The constant in O(log n) definitely matters. In fact assuming that better algorithmic complexity than O(log n) isn't possible, the constant is probably the only thing that matters.....
  • It's often a one-off search, so while preprocessing is possible it's probably not going to be worth it.

推荐答案

让我们命名为间隔 X 这里以Z 搜索到的号码。

Let's name the interval x here and z the searched number.

既然你期望得到均匀分布的值,可以使用插值搜索。这是类似二进制搜索,但分裂的指数范围在启动+((Z - X [开始])*(结束 - 开始))/(X [末] - X [开始])

Since you expect the values to be evenly distributed, you can use interpolation search. This is similar to binary search, but splits the index range at start + ((z - x[start]) * (end - start)) / (x[end] - x[start]).

要获得的运行时间为O(log n)的你要做的结合插值搜索二进制搜索(做的二进制搜索和插值搜索交替一步一步)

To get a running time of O(log n) you have to do combine interpolation search with binary search (do step from binary search and step from interpolation search alternating):

public int search(int[] values, int z) {
    int start = 0;
    int end = values.length-1;

    if (values[0] == z)
         return 0;
    else if (values[end] == z) {
        return end;
    }

    boolean interpolation = true;

    while (start < end) {
        int mid;
        if (interpolation) {
            mid = start + ((z - values[start]) * (end - start)) / (values[end] - values[start]);
        } else {
            mid = (end-start) / 2;
        }
        int v = values[mid];
        if (v == z)
            return mid;
        else if (v > z)
            end = mid;
        else
            start = mid;
        interpolation = !interpolation;
    }
    return -1;
}

由于while循环每秒重复做的二进制搜索的一个步骤,它最多使用两次迭代的二进制搜索将使用(数量 O(log n)的)。由于每个第二步骤是从插值搜索的一个步骤,它的算法应减少INTERVALL大小快,如果输入具有所需的性质。

Since every second iteration of the while loop does a step in binary search, it uses at most twice the number of iterations a binary search would use (O(log n)). Since every second step is a step from interpolation search, it the algorithm should reduce the intervall size fast, if the input has the desired properties.

这篇关于排序数值的有效搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆