算法:混合MergeSort和InsertingSort执行时间 [英] Algorithms: Hybrid MergeSort and InsertionSort Execution Time

查看:106
本文介绍了算法:混合MergeSort和InsertingSort执行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SO SO社区,

我是一名CS学生,目前正在进行结合MergeSort和InsertionSort的实验.可以理解,对于某个阈值S,InsertionSort将比MergeSort具有更快的执行时间.因此,通过合并两种排序算法,可以优化总运行时间.

I am a CS student currently performing an experiment combining MergeSort and InsertionSort. It is understood that for a certain threshold, S, InsertionSort will have a quicker execution time than MergeSort. Hence, by merging both sorting algorithms, the total runtime will be optimized.

但是,在使用1000个样本大小和不同大小的S进行多次实验后,每次实验的结果均未给出明确的答案.这是获得的更好结果的图片(请注意,有一半的时间结果不是确定的):

However, after running the experiment many times, using a sample size of 1000, and varying sizes of S, the results of the experiment does not give a definitive answer each time. Here is a picture of the better results obtained (Note that half of the time the result is not as definitive):

现在,尝试使用样本大小为3500的相同算法代码:

Now, trying the same algorithm code with a sample size of 3500:

最后,尝试使用相同的算法代码,样本大小为500,000(请注意,y轴的单位是毫秒:

Finally, trying the same algorithm code with a sample size of 500,000 (Note that the y-axis is in milliseconds:

尽管从逻辑上讲,当S <= 10时,混合MergeSort会更快,因为InsertionSort没有递归的开销时间.但是,我的小型实验结果却相反.

Although logically, the Hybrid MergeSort will be faster when S<=10, as InsertionSort does not have recursive overhead time. However, the results of my mini experiment says otherwise.

目前,这些是教给我的时间复杂性:

Currently, these are the Time Complexities taught to me:

MergeSort:O(n log n)

MergeSort: O(n log n)

InsertionSort:

InsertionSort:

  • 最佳情况:θ(n)
  • 最坏情况:θ(n ^ 2)

最后,我找到了一个在线资源: https://cs. stackexchange.com/questions/68179/combining-merge-sort-and-insertion-sort 指出:

Finally, I have found an online source: https://cs.stackexchange.com/questions/68179/combining-merge-sort-and-insertion-sort that states that:

混合MergeInsertionSort:

Hybrid MergeInsertionSort:

  • 最佳情况:θ(n + n log(n/x))
  • 最坏情况:θ(nx + n log(n/x))

我想问问CS社区中是否有结果显示出权威证据,证明混合MergeSort算法在低于特定阈值S的情况下比普通MergeSort算法更有效,如果可以,为什么?

I would like to ask if there are results in the CS community that shows definitive proof that a Hybrid MergeSort algorithm will work better than a normal MergeSort algorithm below a certain threshold, S, and if so, why?

非常感谢SO社区,这可能是一个微不足道的问题,但它确实可以阐明我目前有关时间复杂性和事物的许多问题:)

Thank you so much SO community, it might be a trivial question, but it really will clarify many questions that I currently have regarding Time Complexities and stuff :)

注意:我正在使用Java进行算法编码,并且Java会将数据存储在内存中的方式可能会影响运行时间.

Note: I am using Java for the coding of the algorithm, and runtime could be affected by the way java stores data in memory..

Java代码:

 public static int mergeSort2(int n, int m, int s, int[] arr){
        int mid = (n+m)/2, right=0, left=0;
        if(m-n<=s)
            return insertSort(arr,n,m);
        else
        {
            right = mergeSort2(n, mid,s, arr);
            left = mergeSort2(mid+1,m,s, arr);
            return right+left+merge(n,m,s,arr);
        }      
    }

    public static int insertSort(int[] arr, int n, int m){
        int temp, comp=0;
        for(int i=n+1; i<= m; i++){
            for(int j=i; j>n; j--){ 
                comp++;
                comparison2++;
                if(arr[j]<arr[j-1]){
                    temp = arr[j];
                    arr[j] = arr[j-1];
                    arr[j-1] = temp;
                }
                else
                    break;
            }
        }
        return comp;
    }

    public static void shiftArr(int start, int m, int[] arr){
        for(int i=m; i>start; i--)
            arr[i] = arr[i-1];     
    }

public static int merge(int n, int m, int s, int[] arr){
        int comp=0;
        if(m-n<=s)
            return 0;
        int mid = (n+m)/2;
        int temp, i=n,  j=mid+1;
        while(i<=mid && j<=m)
        {
            comp++;
            comparison2++;


            if(arr[i] >= arr[j])
            {
                if(i==mid++&&j==m && (arr[i]==arr[j]))
                    break;
                temp = arr[j];
                shiftArr(i,j++,arr);
                arr[i] = temp;
                if(arr[i+1]==arr[i]){
                   i++;
                }
            }
            i++;


        }
        return comp;
    }

推荐答案

示例代码不是常规的合并排序.合并功能是在移动数组,而不是在原始数组和临时工作数组之间进行合并运行.

The example code isn't a conventional merge sort. The merge function is shifting an array instead of merging runs between the original array and a temporary working array and back.

我测试了自上而下和自下而上的合并排序,都花了大约42 ms == 0.042秒来排序500,000个32位整数,而图表中的表观结果要慢42倍,而不是42毫秒,是1000倍.我还用10,000,000个整数进行了测试,排序花费了超过1秒的时间.

I tested top down and bottom up merge sorts and both take about 42 ms == 0.042 seconds to sort 500,000 32 bit integers, versus the apparent results in the graph which are 1000 times slower at about 42 seconds instead of 42 ms. I also tested with 10,000,000 integers and it takes a bit over 1 second to sort.

过去,我使用C ++将自底向上合并排序与混合的自底向上合并/插入排序进行了比较,对于1600万(2 ^ 24 == 16,777,216)个32位整数,混合排序约为8%使用S == 16时,速度更快.S == 64比S == 16时稍慢.Visual Studio std :: stable_sort是自底向上合并排序的一种变体(临时数组是原始数组大小的1/2),并且插入排序,并使用S == 32.

In the past, using C++, I compared a bottom up merge sort with a hybrid bottom up merge / insertion sort, and for 16 million (2^24 == 16,777,216) 32 bit integers, the hybrid sort was about 8% faster with S == 16. S == 64 was slightly slower than S == 16. Visual Studio std::stable_sort is a variation of bottom up merge sort (the temp array is 1/2 the size of the original array) and insertion sort, and uses S == 32.

对于小数组,插入排序比合并排序更快,这是缓存局部性的组合,并且使用插入排序对小的数组进行排序所需的指令更少.对于伪随机数据和S == 16到64,插入排序的速度大约是合并排序的两倍.

For small arrays, insertion sort is quicker than merge sort, a combination of cache locality and fewer instructions needed to sort a small array with insertion sort. For pseudo random data and S == 16 to 64, insertion sort was about twice as fast as merge sort.

相对增益随着阵列尺寸的增加而减小.考虑到对自底向上合并排序的影响,当S == 16时,仅优化了4个合并通过.在我的具有2 ^ 24 == 16,777,216个元素的测试用例中,这是通过次数的4/24 = 1/6〜= 16.7%,导致大约8%的改进(因此,插入排序的速度大约是merge的两倍)为这4个通行证排序).仅合并排序的总时间约为1.52秒,而混合排序的总时间约为1.40秒,这在仅需1.52秒的过程中就获得了0.12秒的收益.对于S == 16的自上而下的合并排序,将优化4个最深层次的递归.

The relative gain diminishes as the array size increases. Considering the effect on bottom up merge sort, with S == 16, only 4 merge passes are optimized. In my test case with 2^24 == 16,777,216 elements, that's 4/24 = 1/6 ~= 16.7% of the number of passes, resulting in about an 8% improvement (so the insertion sort is about twice as fast as merge sort for those 4 passes). The total times were about 1.52 seconds for the merge only sort, and about 1.40 seconds for the hybrid sort, a 0.12 second gain on a process that only takes 1.52 seconds. For a top down merge sort, with S == 16, the 4 deepest levels of recursion would be optimized.

更新-具有时间复杂度O(n log(n))的混合就地合并排序/插入排序的示例Java代码. (注意-由于递归,辅助存储仍在堆栈上.)就位部分是在合并步骤期间通过将合并区域中的数据与合并区域中的数据交换来完成的.这不是一个稳定的排序(由于合并步骤期间的交换,所以不保留相等元素的顺序).排序500,000个整数大约需要1/8秒,因此我将其增加到1600万(2 ^ 24 == 16777216)个整数,这花了4秒多一点的时间.如果没有插入排序,则排序大约需要4.524秒,而使用S == 64的插入排序,则排序大约需要4.150秒,即大约8.8%.在C中使用基本相同的代码,改进较少:从2.88秒降低到2.75秒,约有4.5%的增益.

Update - Example java code for an hybrid in place merge sort / insertion sort with O(n log(n)) time complexity. (Note - auxiliary storage is still consumed on the stack due to recursion.) The in place part is accomplished during merge steps by swapping the data in the area merged into with the data in the area merged from. This is not a stable sort (the order of equal elements is not preserved, due to the swapping during merge steps). Sorting 500,000 integers takes about 1/8th of a second, so I increased this to 16 million (2^24 == 16777216) integers, which takes a bit over 4 seconds. Without the insertion sort, the sort takes about 4.524 seconds, and with the insertion sort with S == 64, the sort takes about 4.150 seconds, about 8.8% gain. With essentially the same code in C, the improvement was less: from 2.88 seconds to 2.75 seconds, about 4.5% gain.

package msortih;
import java.util.Random;

public class msortih {

    static final int S = 64;    // use insertion sort if size <= S

    static void swap(int[] a, int i, int j) {
        int tmp = a[i]; a[i] = a[j]; a[j] = tmp;
    }

    // a[w:] = merged a[i:m]+a[j:n]
    // a[i:] = reordered a[w:]
    static void wmerge(int[] a, int i, int m, int j, int n, int w) {
        while (i < m && j < n)
            swap(a, w++, a[i] < a[j] ? i++ : j++);
        while (i < m)
            swap(a, w++, i++);
        while (j < n)
            swap(a, w++, j++);
    }  

    // a[w:]  = sorted    a[b:e]
    // a[b:e] = reordered a[w:]
    static void wsort(int[] a, int b, int e, int w) {
        int m;
        if (e - b > 1) {
            m = b + (e - b) / 2;
            imsort(a, b, m);
            imsort(a, m, e);
            wmerge(a, b, m, m, e, w);
        }
        else
            while (b < e)
                swap(a, b++, w++);
    }

    // inplace merge sort a[b:e]
    static void imsort(int[] a, int b, int e) {
        int m, n, w, x;
        int t;
        // if <= S elements, use insertion sort
        if (e - b <= S){
            for(n = b+1; n < e; n++){
               t = a[n];
               m = n-1;
                while(m >= b && a[m] > t){
                    a[m+1] = a[m];
                    m--;}
                a[m+1] = t;}
            return;
        }
        if (e - b > 1) {
            // split a[b:e]
            m = b + (e - b) / 2;
            w = b + e - m;
            // wsort -> a[w:e] = sorted    a[b:m]
            //          a[b:m] = reordered a[w:e]
            wsort(a, b, m, w);
            while (w - b > 2) {
                // split a[b:w], w = new mid point
                n = w;
                w = b + (n - b + 1) / 2;
                x = b + n - w;
                // wsort -> a[b:x] = sorted    a[w:n]
                //          a[w:n] = reordered a[b:x]
                wsort(a, w, n, b);
                // wmerge -> a[w:e] = merged    a[b:x]+a[n:e]
                //           a[b:x] = reordered a[w:n]
                wmerge(a, b, x, n, e, w);
            }
            // insert a[b:w] into a[b:e] using left shift
            for (n = w; n > b; --n) {
                t = a[n-1];
                for (m = n; m < e && a[m] < t; ++m)
                    a[m-1] = a[m];
                a[m-1] = t;
            }
        }
    }

    public static void main(String[] args) {
        int[] a = new int[16*1024*1024];
        Random r = new Random(0);
        for(int i = 0; i < a.length; i++)
            a[i] = r.nextInt();
        long bgn, end;
        bgn = System.currentTimeMillis();
        imsort(a, 0, a.length);
        end = System.currentTimeMillis();
        for(int i = 1; i < a.length; i++){
            if(a[i-1] > a[i]){
                System.out.println("failed");
                break;
            }
        }
        System.out.println("milliseconds " + (end-bgn));
    }
}

这篇关于算法:混合MergeSort和InsertingSort执行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆