字符串插补搜索 [英] Interpolation search on strings

查看:175
本文介绍了字符串插补搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于那些不熟悉插值搜索的人,这是一种在排序数组中搜索值的方法,该值可能比二进制搜索要快.您查看第一个和最后一个元素,然后(假设数组的内容均匀分布)进行线性插值以预测位置.

例如:我们有一个长度为100的数组,其中array [0] = 0和array [99] = 99.如果我们正在寻找80,则尝试在array [50]上尝试array [80]是很直观的,如果数组接近均匀分布,则预期的运行时间将减少为log(log(N))

对于数字,要检查的位置由以下公式定义: low + ((toFind - sortedArray[low]) * (high - low + 1)) / (sortedArray[high] - sortedArray[low]).

一个用来展示插值搜索直观性的常见示例是:设想尝试在字典中查找单词"yellow".您将不会使用二进制搜索并转到中间点.相反,您将转到预期的位置.

人类可以自然地线性插值字符串,但是我不知道如何编写代码. 我们如何线性内插字符串?

解决方案

要查找两个字符串之间的距离",一种简单的方法是查看两个字符串之间不同的第一个字母,并为每个字符串分配一个数值,然后求差.

例如,如果为每个字母分配的值等于其在字母表中的位置,则从"a"到"y"的距离将为24,从"y"到"z"的距离将为1./p>

一种性能更好的方法将通过字典来对各种字母进行加权,以使它们在实际单词中的常见程度达到

.

另一种改进是查看两个字符-例如,"aa"离"bz"远比"az"离"ba"远.超过两个字符不会给您带来多少好处.

该方法不流行的原因是,它使二进制搜索算法复杂化,但收益不高.如果要定时的话,您甚至可能会发现标准的二进制搜索更快.在较少的比较中获得的结果在确定距离的复杂性中会丢失.

还请注意,该算法的最坏情况性能比二进制搜索要差.考虑例如在"aa","ab","ac","ad","ae","zz"的列表中搜索"ae"-异常值"zz"将使搜索偏向始终尝试搜索范围的开始.在这种条件下,它会降解为O(n).

For those of you not familiar with interpolation search, it is method to search for a value in a sorted array that is potentially faster than binary search. You look at the first and last element and (assuming that the contents of the array are uniformly distributed) linearly interpolate to predict the location.

For example: we have an array of length 100 with array[0]=0 and array[99]=99. If we are looking for 80, it is intuitive to try array[80] over array[50], and if the array is close to uniformly distributed, the expected runtime is reduced to log(log(N))

For numbers, the location to check is defined by the equation: low + ((toFind - sortedArray[low]) * (high - low + 1)) / (sortedArray[high] - sortedArray[low]).

A common example used to show off the intuitive nature of interpolation search is: imagine trying to find the word 'yellow' in a dictionary. You wouldn't use binary search and go to the half way point. Rather, you would go to the expected location.

Humans can naturally linearly interpolate strings, but I can't figure out how code it up. How do we linearly interpolate strings?

解决方案

To find the "distance" between two strings, a simple method would be to look at the first letter that is different between them and assign a numeric value to each, then take the difference.

For example, the distance from "a" to "y" would be 24 and the distance from "y" to "z" would be 1, if each letter were assigned a value equal to its position in the alphabet.

A better performing method would go through a dictionary to weight the various letters by how common they are in actual words.

Another refinement would be to look at two characters - "aa" is farther from "bz" than "az" is from "ba", for example. Going beyond two characters wouldn't buy you much.

The reason this method isn't more popular is that it complicates the binary search algorithm for not a lot of gain. If you were to time it you might even find that standard binary search is faster; what you gain in fewer comparisons you lose in the complexity of determining distances.

Also note that the worst-case performance of this algorithm is worse than a binary search. Consider for example searching for "ae" in the list of "aa","ab","ac","ad","ae","zz" - the outlier "zz" is going to bias the search so that it's always trying the beginning of the search range. It degrades to O(n) under these conditions.

这篇关于字符串插补搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆