用于查看算法运行时间的计时器表示我的二进制搜索比线性搜索花费的时间更长 [英] A timer for seeing how long an algorithm is taking is saying my binary search takes longer than a linear search

查看:114
本文介绍了用于查看算法运行时间的计时器表示我的二进制搜索比线性搜索花费的时间更长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是gist
上的课程 https://gist.github.com/2605302

我用不同的文件对它进行了多次测试,即使二进制搜索的比较较少,所以花费的时间总是更多。出了什么问题?

I have tested it multiple times with different files and even when there is less comparisons done for binary search the time taken is ALWAYS more. What's going wrong?

public static int linerSearch ( String array [], String word, long resultsArray [])
{
    int comparisons = 0;
    int pos = -1;
    //i have started the timer where the search actualy starts
    long start = System.nanoTime ();
    for (int i = 0; i < array.length; i++){
        comparisons = comparisons + 1;
        if (array [i].equals (word)){
            pos = i;
            break;
        }
    }
    long stop = System.nanoTime ();
    long total = stop - start;
    resultsArray [0] = total;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons;
    return pos;
}

这是下一个binarySearch类

Here is the next binarySearch class

public  static int binarySearch (String [] array, String word, resultsArray []) {
    int start = 0;
    int end = array.length - 1;;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;
    long start2 = System.nanoTime ();
    Arrays.sort (array);
    while (start <= end) {
        midPt = (start + end) / 2;
        comparisons2 = comparisons2 + 1;
        if (array [midPt].equalsIgnoreCase (word)) {
            pos = midPt;
            break;
        }
        else if (array [midPt].compareToIgnoreCase (word) < 0) {
            start = midPt + 1;
            comparisons2 = comparisons2 + 1;
            //camparisons2 addition was added inside this elseif and other elseif as a work around for not breaking the elseif statement tree, if it has made it two the last elseif then two camparisons after the first one will have been done
        } else if (array [midPt].compareToIgnoreCase (word) > 0)  {
            comparisons2 = comparisons2 + 2;
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime ();
    long total2 = stop2 - start2;
    resultsArray [0] = total2;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons2;
    return pos;
}

编辑:我还应该添加我在之前排序的数组上尝试过的没有那行代码,它仍然是一个更长的时间它不应该

edit: I should also add that i tried it on an already previously sorted array without that line of code and it was still a longer time when it shouldn't have been

推荐答案

好的,我有这对你来说一劳永逸。首先,这是我使用它的二进制搜索方法:

Okay, I've got this worked out for you once and for all. First, here's the binary search method as I used it:

public static int binarySearch(String[] array, String word, long resultsArray[]) {
    int start = 0;
    int end = array.length - 1;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;

    //Arrays.sort(array);

    long start2 = System.nanoTime();
    while (start <= end) {
        midPt = (start + end) / 2;
        int comparisonResult = array[midPt].compareToIgnoreCase(word);
        comparisons2++;
        if (comparisonResult == 0) {
            pos = midPt;
            break;
        } else if (comparisonResult < 0) {
            start = midPt + 1;
        } else { // comparisonResult > 0
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime();
    long total2 = stop2 - start2;

    resultsArray[0] = total2;
    resultsArray[1] = (long) array.length;
    resultsArray[2] = (long) comparisons2;
    return pos;
}

您会注意到我通过保存比较结果减少了比较次数并使用它。

You'll notice that I reduced the number of comparisons by saving the comparison result and using that.

接下来,我下载了这个235882字的列表。它已被排序,忽略了这种情况。然后,我构建了一个测试方法,将该文件的内容加载到一个数组中,然后使用这两种搜索方法查找该列表的每个单词。然后分别平均每种方法的比较次数和次数。

Next, I downloaded this list of 235882 words. It is already sorted ignoring the case. Then, I built a test method that loads the contents of that file into an array and then uses both of those searching methods to find every word of that list. It then averages the times and numbers of comparisons for each method separately.

我发现你必须小心选择使用哪种比较方法:如果你 Arrays.sort(...)一个列表,你在二进制搜索中使用 compareToIgnoreCase ,它会失败!失败我的意思是它无法从给定的列表中找到该单词,即使该单词实际存在于那里。这是因为 Arrays.sort(...)是一个区分大小写的字符串排序器。如果你使用它,你必须使用 compareTo(...)方法。

I found out that you must be careful in choosing which comparison methods to use: if you Arrays.sort(...) a list and you use compareToIgnoreCase in binary search, it fails! By failing I mean that it cannot find the word from the given list even though the word actually exists there. That is because Arrays.sort(...) is a case-sensitive sorter for Strings. If you use that, you must use the compareTo(...) method with it.

所以,我们有两种情况


  1. 一个不区分大小写的排序列表,并使用 compareToIgnoreCase

  2. 区分大小写的列表并使用 compareTo

  1. a case-insensitively sorted list and the use of compareToIgnoreCase
  2. a case-sensitively sorted list and the use of compareTo

除了二进制搜索中的这些选项外,您还可以选择线性搜索:是否使用等于 equalsIgnoreCase 。我对所有这些案例进行了测试并对它们进行了比较。平均结果:

In addition to these options in the binary search, you also have options in the linear search: whether to use equals or equalsIgnoreCase. I ran my test for all of these cases and compared them. Average results:


  • 线性搜索等于:时间:725536 ns;比较:117941;时间/比较:6.15 ns

  • 使用 equalsIgnoreCase进行线性搜索:时间:1064334 ns;比较:117940;时间/比较:9.02 ns

  • 二进制搜索 compareToIgnoreCase :时间:1619 ns;比较:16;时间/比较:101.19 ns

  • 使用 compareTo 进行二进制搜索:时间:763 ns;比较:16;时间/比较:47.69 ns

  • Linear search with equals: time: 725536 ns; comparisons: 117941; time / comparison: 6.15 ns
  • Linear search with equalsIgnoreCase: time: 1064334 ns; comparisons: 117940; time / comparison: 9.02 ns
  • Binary search with compareToIgnoreCase: time: 1619 ns; comparisons: 16; time / comparison: 101.19 ns
  • Binary search with compareTo: time: 763 ns; comparisons: 16; time / comparison: 47.69 ns

所以,现在我们可以清楚地看到你的问题: compareToIgnoreCase 方法花费的时间是等于方法的16倍!因为平均而言,它需要二进制搜索16比较找到给定的单词,您可以在那个时间执行124次线性比较。因此,如果您使用比这更短的单词列表进行测试,线性搜索确实总是比二进制搜索更快,因为他们使用的方法不同。

So, now we can clearly see your problem: the compareToIgnoreCase method takes some 16 times as much time as the equals method! Because, on average, it takes the binary search 16 comparisons to find the given word, you can perform 124 linear comparisons in that time. So if you test with word lists shorter than that, the linear search is, indeed, always faster than the binary search due to the different methods they are using.

我实际上还计算了线性搜索能够比二进制搜索更快找到的单词数:164当使用 compareTo 方法时,使用时为614 compareToIgnoreCase 方法。在235882个单词的列表中,这个数字约为0.3%。总而言之,我认为二进制搜索比线性搜索更快仍然是安全的。 :)

I actually also counted the number of words that the linear search was able to find faster than the binary search: 164 when using the compareTo method and 614 when using the compareToIgnoreCase method. Of the the list of 235882 words, that's about 0.3 percent. So all in all I think it's still safe to say that the binary search is faster than the linear search. :)

在你问之前的最后一点:我从 binarySearch 方法中删除了排序代码,因为那是实际的一个完全不同的东西。由于您要比较两个搜索算法,如果您将排序算法的成本添加到其数据中,则对另一个算法不公平。我已经在另一个答案中发布了以下评论作为评论,但为了完整起见,我将在此处复制:

One last point before you ask: I removed the sorting code from the binarySearch method, because that's actually an entirely different thing. Since you are comparing two searching algorithms, it's not fair for the other if you add the cost of a sorting algorithm to its figures. I posted the following as a comment in another answer already, but I'll copy it here for completeness:

二进制搜索会增加排序的开销成本。因此,如果您只需要从数组中找到一个元素,线性搜索总是更快,因为排序至少需要O(n log n)时间,然后二进制搜索需要O(log n)时间,由O(n)控制记录n)操作。线性搜索在O(n)时间内执行,该时间优于O(n log n)。但是一旦你对数组进行了排序,O(log n)就好于O(n)。

Binary search has the added overhead cost of sorting. So if you only need to find one element from an array, linear search is always faster, because sorting takes at least O(n log n) time and then a binary search takes O(log n) time, dominated by the O(n log n) operation. A linear search performs in O(n) time, which is better than O(n log n). But once you have the array sorted, O(log n) is way better than O(n).

如果你坚持在<$ c中使用排序命令$ c> binarySearch 方法,您应该知道,通过我的设置排序,初始随机顺序中的长字列表平均需要超过140000000 ns或0.14秒。在那个时候你可以使用 equals 方法执行大约23000000次比较,所以你真的应该使用二进制搜索a)你的数组是随机顺序b)如果你只需要从那里找到一个或几个元素。

If you insist on having the sorting command in the binarySearch method, you should be aware that with my setup sorting that long list of words from an initially random order takes more than 140000000 ns, or 0.14 seconds, on average. In that time you could perform some 23000000 comparisons using the equals method, so you really should not use binary search if a) your array is in a random order and b) if you only ever need to find just one or a couple of elements from there.

还有一件事。在此示例中,您在String数组中搜索单词时,访问数组中项目的成本可以忽略不计,因为该数组保存在计算机的快速主内存中。但是,如果你有一大堆有序文件并且你试图从他们中找到一些东西,那么访问单个文件的成本会使每个其他计算的成本可以忽略不计。因此,二元搜索在这种情况下也会完全摇摆不定。

And one more thing. In this example, where you are searching for words in a String array, the cost of accessing an item in the array is negligible because the array is saved in the fast main memory of the computer. But if you had, say, a huge bunch of ordered files and you tried to find something from them, then the cost of accessing a single file would make the cost of every other calculation negligible instead. So binary search would totally rock in that scenario (too).

这篇关于用于查看算法运行时间的计时器表示我的二进制搜索比线性搜索花费的时间更长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆