有什么不对的插值搜索实现? [英] What's wrong with this Interpolation search implementation?

查看:110
本文介绍了有什么不对的插值搜索实现?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个常见的​​C / C ++实现在互联网找到了插值搜索算法。然而,一些100000整数有序数组使用时,中间变量开始产生负面数组索引,导致分段错误。还有什么问题呢?

 的#include< stdlib.h中>
#包括< stdio.h中>
#包括< time.h中>
INT interpolationSearch(INT sortedArray [],INT找到相当,诠释的len){
    //返回找到相当的指数sortedArray,或-1,如果未找到
    INT低= 0;
    INT高= LEN  -  1;
    INT中旬;

    而(sortedArray [小]< =找到相当&功放;&放大器; sortedArray [大]> =找到相当){
        中期=低+((找到相当 -  sortedArray [小])*(高 - 低))/
              (sortedArray [高]  -  sortedArray [小]);

        如果(sortedArray [MID]<找到相当){
            低=中等+ 1;
        }否则如果(sortedArray [MID]>找到相当){
            高=中旬 -  1;
        } 其他 {
            返回中旬;
        }
    }

    如果(sortedArray [小] ==找到相当)
        回报低;
    其他
        返回-1; // 未找到
}

诠释的主要(无效){
    函数srand(时间(0));
    INT ARR [100000]。
    的for(int i = 0; I< 100000;我++){
        改编[I] =兰特()%100000;
    }

    INT长度= sizeof的(ARR)/的sizeof(INT);
    的qsort(ARR,长度的sizeof(INT),顺序);

    对于(INT J = 0; J< 10000; J ++){
        interpolationSearch(ARR,兰特()%100000长);
    }
}
 

解决方案

子-EX pression:((找到相当 - sortedArray [小])*(高 - 低))

...可以很容易地评估为类似:((99999-0)*(99999-0))== 99999 ^ 2

......这是远远大于2 ^ 31(== 32位有符号整数的范围内)。

一旦超过2 ^ 31-1,整数将溢出到负数,因此,你的负面指标。如果超过2 ^ 32(它也可以做到),那么(最有可能的,技术上不确定的),你将失去的高位,你会拥有随机的偏移,正面和负面的。

要避免这一切,你需要做你的数学谨慎,以确保没有你的分前pressions产生一个整数溢出。平时要做到这一点,最简单的方法是将转换为浮点,其范围要比32位整数较大的多份订单。

在最后的分析中,内插像这样的二进制搜索通常是不值得的 - 计算插值的费用通常比其节省循环的一些额外的迭代更大

This is a common C/C++ implementation of the Interpolation Search algorithm found around the Internet. However, when used with a sorted array of some 100000 integers, the mid-variable starts generating negative array-indexes, causing a Segmentation Fault. What could the problem be?

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int interpolationSearch(int sortedArray[], int toFind, int len) {
    // Returns index of toFind in sortedArray, or -1 if not found
    int low = 0;
    int high = len - 1;
    int mid;

    while (sortedArray[low] <= toFind && sortedArray[high] >= toFind) {
        mid = low + ((toFind - sortedArray[low]) * (high - low)) /
              (sortedArray[high] - sortedArray[low]);

        if (sortedArray[mid] < toFind) {
            low = mid + 1;
        } else if (sortedArray[mid] > toFind) {
            high = mid - 1;
        } else {
            return mid;
        }
    }

    if (sortedArray[low] == toFind)
        return low;
    else
        return -1; // Not found
}

int main(void) {
    srand(time(0));
    int arr[100000];
    for (int i=0; i<100000; i++) {
        arr[i] = rand()%100000;
    }

    int length = sizeof(arr)/sizeof(int);
    qsort(arr,length,sizeof(int),order);

    for (int j=0; j<10000; j++) {
        interpolationSearch(arr,rand()%100000,length);
    }
}

解决方案

The sub-expression: ((toFind - sortedArray[low]) * (high - low))

... can easily evaluate to something like: ((99999-0) * (99999-0)) == 99999^2

... which is much larger than 2^31 (== the range of 32-bit signed integers).

Once it exceeds 2^31-1, the integer will overflow into negative numbers, hence your negative indices. If it exceeds 2^32 (which it also could do), then (most likely, technically undefined) you'll lose the high-order bits and you'll end up with effectively random offsets, both positive and negative.

To avoid all of this, you need to do your math carefully to make sure none of your sub-expressions yield an integer overflow. Usually the easiest way to do this is to convert to floating-point whose range is many orders of magnitude larger than 32-bit integers.

In the final analysis, interpolation such as this for binary search is usually not worth it -- the expense of computing the interpolant is typically greater than the few extra iterations of the loop that it "saves".

这篇关于有什么不对的插值搜索实现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆