给定运行时数据,如何知道排序程序是使用冒泡排序还是插入排序? [英] Given the runtime data, how to know if sorting program uses bubble sort or insertion sort?

查看:62
本文介绍了给定运行时数据,如何知道排序程序是使用冒泡排序还是插入排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我测量了一个排序程序/算法,并根据运行时数据将其缩小为两种排序算法-气泡排序和插入排序.

是否有办法确定它是哪一个?当然不知道代码.

它们都具有相同的时间复杂度,我没主意了.

时间复杂度数据:

  • 排序:O(n)1000个数字所需的时间= 0.0047s
  • 未排序的随机数:O(n ^ 2)1000个数字所需的时间= 0.021秒
  • 降序:O(n ^ 2)1000个数字所需的时间= 0.04s

提前谢谢!

解决方案

  1. 您的1000个排序元素太低

    测量的时间太短,无法代表有效的测量结果(因为大部分时间可能不是排序本身所用,而是窗口的初始化,打开文件等...).

    您需要至少100毫秒或更长的时间(理想情况下为1秒).

  2. 如果您有权访问正在排序的数据

    您可以引入对每种类型的排序都具有挑战性的数据集(并且从推断使用算法的时间开始)……因此,例如,冒泡排序对于反向排序的排序数组而言是最慢的……因此,将经过排序的数据升序传递给降序,随机和比较时间.让我们调用时间 tasc,tdes,trnd 并假设升序排序,那么如果涉及气泡排序,则应该为:

      tasc O(n)<trnd<tdes O(n ^ 2) 

    如此:

      tasc * n == tdes + margin_of错误 

    所以只需测试 tdes/tasc 接近 n ...就有一定的误差...

    因此,您只需要创建一个样本数据,该数据对于特定类型的排序将很难,而对于其他类型则很难...并且从发现时就开始检查是否存在这种情况,直到找到算法为止.

    这里有一些数据(所有时间都在 [ms] 中),我对矿井气泡排序和asc排序数据进行了测试:

      n tasc tdesc tasc * n1000 0.00321 2.96147 3.2057502000 0.00609 11.76799 12.1818554000 0.01186 45.58834 47.445111 

    更明确的是,我们是否具有复杂性运行时 O(n)

      t(O(n))= c * n 

    转换为复杂度为 O(n ^ 2)的运行时(假设相同的恒定时间 c ):

      t(O(n ^ 2))= c * n * n = t(O(n))* n 

    通过这种方式,您可以将具有不同复杂度的时间进行比较,只需将所有测得的时间转换为单个常见的复杂度...

  3. 如果可以选择排序的数据大小

    然后,正如注释中所提到的,您可以推断出n n 增加(加倍)的时间增长率,从中您可以估算复杂度,并从中可以知道使用了哪种算法./p>

    因此,假设从#2 开始的测量时间,则对于 O(n),恒定时间 c 应该相同,对于tasc( O(n)):

      n tasc c = tasc/n1000 0.00321 0.0000032102000 0.00609 0.0000030454000 0.01186 0.000002965 

    和tdesc( O(n ^ 2)):

      n tdesc tdesc/n ^ 21000 2.96147 0.000002961470002000 11.76799 0.000002941997504000 45.58834 0.00000284927125 

    正如您所见,两次 tasc,tdesc c 几乎相同,这意味着它们遵守了估计的复杂度 O(n),O(n ^ 2)

但是,如果不知道被测App的功能,就很难确定,因为排序可能先于处理……例如,可能会扫描数据以检测可行的形式(排序,随机,几乎排序...)在 O(n)中,并连同结果以及数据大小一起,它可能会选择要使用的排序算法...因此,您的测量可能会测量使结果无效的不同例程...

[edit1]我有一个自动检测复杂性的疯狂想法

简单地通过测试所有被测量时间与其对应的 n 之间的恒定时间常数是否大致相同...下面是简单的 C ++/VCL 代码:

 <代码>//$$ ----表格CPP ----//---------------------------------------------------------------------------#include< vcl.h>#include< math.h>#pragma hdrstop#include"Unit1.h";//---------------------------------------------------------------------------#pragma软件包(smart_init)#pragma资源"* .dfm"TForm1 * Form1;//---------------------------------------------------------------------------double factorial [] =//n [-],t [ms]{11,0.008,12,0.012,13,0.013,14,0.014,15,0.016,16,0.014,17,0.015,18,0.017,19,0.019,20,0.016,21,0.017,22,0.019,23,0.021,24,0.023,25,0.025,26,0.027,27,0.029,28,0.032,29,0.034,30,0.037,31,0.039,32,0.034,33,0.037,34,0.039,35,0.041,36,0.039,37,0.041,38,0.044,39,0.046,40,0.041,41,0.044,42,0.046,43,0.049,44,0.048,45,0.050,46,0.054,47,0.056,48,0.056,49,0.060,50,0.063,51,0.066,52,0.065,53,0.069,54,0.072,55,0.076,56,0.077,57,0.162,58,0.095,59,0.093,60,0.089,61,0.093,62,0.098,63,0.096,64,0.090,65,0.100,66,0.104,67,0.111,68,0.100,69,0.121,70,0.109,71,0.119,72,0.104,73,0.124,74,0.113,75,0.118,76,0.118,77,0.123,78,0.129,79,0.133,80,0.121,81,0.119,82,0.131,83,0.150,84,0.141,85,0.148,86,0.154,87,0.163,88,0.211,89,0.151,90,0.157,91,0.166,92,0.161,93,0.169,94,0.173,95,0.188,96,0.181,97,0.187,98,0.194,99,0.201,100,0.185,101,0.191,102,0.202,103,0.207,104,0.242,105,0.210,106,0.215,107,0.221,108,0.217,109,0.226,110,0.232,111,0.240,112,0.213,113,0.231,114,0.240,115,0.252,116,0.248,117,0.598,118,0.259,119,0.261,120,0.254,121,0.263,122,0.270,123,0.281,124,0.290,125,0.322,126,0.303,127,0.313,128,0.307,0,0.000};//---------------------------------------------------------------------------双重sort_asc [] ={1000,0.00321,2000,0.00609,4000,0.01186,0,0.000};//---------------------------------------------------------------------------双重sort_desc [] ={1000、2.96147,2000,11.76799,4000,45.58834,0,0.000};//---------------------------------------------------------------------------双重sort_rand [] ={1000、3.205750,2000,12.181855,4000,47.445111,0,0.000};//---------------------------------------------------------------------------double div(double a,double b){return(fabs(b)> 1e-10)?a/b:0.0;}//---------------------------------------------------------------------------AnsiString get_complexity(double * dat)//期望dat [] = {n0,t(n0),n1,t(n1),...,0,0}{AnsiString O ="O(?)";我双倍t,n,c,c0,c1,a,dc = 1e + 10;#为(e = 1,i = 0; dat [i]> 0.5;){n = dat [i];i ++;t = dat [i];i ++;#如果((c< = 0.0)|||(n< 2.0))继续,请定义测试对象;如果(e){e = 0;c0 = c;c1 = c;}如果(c0> c)c0 = c;如果(c1 <c)c1 = c;} a = fabs(1.0-div(c0,c1));如果(dc> = a){dc = a;O = s;}测试c = div(t,n);testend("O(n)");测试c = div(t,n * n);testend("O(n ^ 2)");测试c = div(t,n * n * n);testend("O(n ^ 3)");测试c = div(t,n * n * n * n);testend("O(n ^ 4)");测试a = log(n);c = div(t,a);testend("O(log(n))");测试a = log(n);c = div(t,a * a);testend("O(log ^ 2(n))");测试a = log(n);c = div(t,a * a * a);testend("O(log ^ 3(n))");测试a = log(n);c = div(t,a * a * a * a);testend("O(log ^ 4(n))");测试a = log(n);c = div(t,n * a);testend("O(n.log(n))");测试a = log(n);c = div(t,n * n * a);testend("O(n ^ 2.log(n))");测试a = log(n);c = div(t,n * n * n * a);testend("O(n ^ 3.log(n))");测试a = log(n);c = div(t,n * n * n * n * a);testend("O(n ^ 4.log(n))");测试a = log(n);c = div(t,n * a * a);testend("O(n.log ^ 2(n))");测试a = log(n);c = div(t,n * n * a * a);testend("O(n ^ 2.log ^ 2(n))");测试a = log(n);c = div(t,n * n * n * a * a);testend("O(n ^ 3.log ^ 2(n))");测试a = log(n);c = div(t,n * n * n * n * a * a);testend("O(n ^ 4.log ^ 2(n))");测试a = log(n);c = div(t,n * a * a * a);testend("O(n.log ^ 3(n))");测试a = log(n);c = div(t,n * n * a * a * a);testend("O(n ^ 2.log ^ 3(n))");测试a = log(n);c = div(t,n * n * n * a * a * a);testend("O(n ^ 3.log ^ 3(n))");测试a = log(n);c = div(t,n * n * n * n * a * a * a);testend("O(n ^ 4.log ^ 3(n))");测试a = log(n);c = div(t,n * a * a * a * a);testend("O(n.log ^ 4(n))");测试a = log(n);c = div(t,n * n * a * a * a * a);testend("O(n ^ 2.log ^ 4(n))");测试a = log(n);c = div(t,n * n * n * a * a * a * a);testend("O(n ^ 3.log ^ 4(n))");测试a = log(n);c = div(t,n * n * n * n * a * a * a * a);testend("O(n ^ 4.log ^ 4(n))");#undef testend#undef testbeg返回O + AnsiString().sprintf(错误=%.6lf",dc);}//---------------------------------------------------------------------------__fastcall TForm1 :: TForm1(TComponent *所有者):TForm(所有者){mm_log-> Lines-> Clear();mm_log-> Lines-> Add("factorial" + get_complexity(factorial));mm_log-> Lines-> Add(" sort asc" + get_complexity(sort_asc));;mm_log-> Lines-> Add(" sort desc" + get_complexity(sort_desc)));mm_log-> Lines-> Add(" sort rand && quot; + get_complexity(sort_rand)));}//------------------------------------------------------------------------- 

与我的相关时间测量快速精确的bigint阶乘,我只使用了8ms以上的较大时间,并且从上面输出的排序度量:

 阶乘O(n.log ^ 2(n))错误= 0.665782排序asc O(n)错误= 0.076324排序des O(n ^ 2)错误= 0.037886排序rand O(n ^ 2)错误= 0.075000 

该代码仅测试了少数受支持的复杂性,并输出了具有最低错误的代码(不同 n 之间<​​code> c 恒定时间的变化)...

只需忽略VCL内容,然后将AnsiString转换为所需的任何字符串或输出...

I measured a sorting program / algorithm and based on the runtime data, I have narrowed it down to two sorting algorithms - bubble sort and insertion sort.

Is there a way to know for sure which one it is? Without knowing the code of course.

They both have the same time complexity and I'm out of ideas.

Time complexity data:

  • Sorted: O(n) Time taken for 1000 numbers = 0.0047s
  • Random unsorted: O(n^2) Time taken for 1000 numbers = 0.021s
  • Descending order: O(n^2) Time taken for 1000 numbers = 0.04s

Thanks in advance!

解决方案

  1. Your 1000 elements for sort is too low

    the measured times are too low to represent a valid measurement (as majority of the time might not be used by the sort itself but initialization of window, openning files etc ...).

    you need times at least 100ms or more (1 sec is ideal).

  2. if you have access to data that is being sorted

    You can introduce data set that will be challenging for each type of sort (and from times infer used algo)... so for example bubble sort is slowest for sorted array in reverse order ... so pass sorted data ascending and descending and random and compare times. let call the times tasc,tdes,trnd and assuming ascending sort then if bubble sort is involved it should be:

    tasc O(n) < trnd  < tdes O(n^2)
    

    so:

    tasc*n == tdes + margin_of error
    

    so just test tdes/tasc is close to n ... with some margin for error ...

    so you just need to create a sample data that will be hard for specific type of sort and not for the others ... and from the times detect if it is the case or not until you find algo used.

    Here some data (all times are in [ms]) I tested on mine bubble sort and asc ordered data:

       n     tasc    tdesc    tasc*n
    1000  0.00321  2.96147  3.205750
    2000  0.00609 11.76799 12.181855
    4000  0.01186 45.58834 47.445111
    

    to be more clear if we have runtime for complexity O(n)

    t(O(n)) = c*n
    

    to convert to runtime with complexity O(n^2) (assuming the same constant time c):

    t(O(n^2)) = c*n*n = t(O(n)) * n
    

    This way you can compare times with different complexities you just need to convert all measured time into single common complexity...

  3. if you can chose sorted data size

    then as it was mentioned in comments you can infer the growth rate of the times with increasing n (doubling) from that you can estimate complexity and from that you can tell which algo was used.

    So lets assume measured times from #2 then for O(n) the constant time c should be the same so for tasc (O(n)):

       n     tasc    c=tasc/n
    1000  0.00321 0.000003210
    2000  0.00609 0.000003045 
    4000  0.01186 0.000002965 
    

    and for tdesc (O(n^2)):

       n     tdesc        tdesc/n^2
    1000   2.96147 0.00000296147000
    2000  11.76799 0.00000294199750
    4000  45.58834 0.00000284927125
    

    as you can see the c is more or less the same for both times tasc,tdesc which means they comply their estimated complexities O(n),O(n^2)

However without knowing what the tested App does is hard to be sure as sorting might be preceded by processing ... for example data might be scanned to detect the form (sorted, random, almost sorted ...) which is doable in O(n) and with the result along with data size it might chose which sorting algo to use ... So your measurements might measure different routines invalidating results ...

[edit1] I had an insane idea of detecting the complexity automaticaly

Simply by testing if constant time constant is more or less the same between all meassured times versus their corresponding n... Here simple C++/VCL code:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
double factorial[]= // n[-],t[ms]
    {
     11,0.008,
     12,0.012,
     13,0.013,
     14,0.014,
     15,0.016,
     16,0.014,
     17,0.015,
     18,0.017,
     19,0.019,
     20,0.016,
     21,0.017,
     22,0.019,
     23,0.021,
     24,0.023,
     25,0.025,
     26,0.027,
     27,0.029,
     28,0.032,
     29,0.034,
     30,0.037,
     31,0.039,
     32,0.034,
     33,0.037,
     34,0.039,
     35,0.041,
     36,0.039,
     37,0.041,
     38,0.044,
     39,0.046,
     40,0.041,
     41,0.044,
     42,0.046,
     43,0.049,
     44,0.048,
     45,0.050,
     46,0.054,
     47,0.056,
     48,0.056,
     49,0.060,
     50,0.063,
     51,0.066,
     52,0.065,
     53,0.069,
     54,0.072,
     55,0.076,
     56,0.077,
     57,0.162,
     58,0.095,
     59,0.093,
     60,0.089,
     61,0.093,
     62,0.098,
     63,0.096,
     64,0.090,
     65,0.100,
     66,0.104,
     67,0.111,
     68,0.100,
     69,0.121,
     70,0.109,
     71,0.119,
     72,0.104,
     73,0.124,
     74,0.113,
     75,0.118,
     76,0.118,
     77,0.123,
     78,0.129,
     79,0.133,
     80,0.121,
     81,0.119,
     82,0.131,
     83,0.150,
     84,0.141,
     85,0.148,
     86,0.154,
     87,0.163,
     88,0.211,
     89,0.151,
     90,0.157,
     91,0.166,
     92,0.161,
     93,0.169,
     94,0.173,
     95,0.188,
     96,0.181,
     97,0.187,
     98,0.194,
     99,0.201,
    100,0.185,
    101,0.191,
    102,0.202,
    103,0.207,
    104,0.242,
    105,0.210,
    106,0.215,
    107,0.221,
    108,0.217,
    109,0.226,
    110,0.232,
    111,0.240,
    112,0.213,
    113,0.231,
    114,0.240,
    115,0.252,
    116,0.248,
    117,0.598,
    118,0.259,
    119,0.261,
    120,0.254,
    121,0.263,
    122,0.270,
    123,0.281,
    124,0.290,
    125,0.322,
    126,0.303,
    127,0.313,
    128,0.307,
      0,0.000
    };
//---------------------------------------------------------------------------
double sort_asc[]=
    {
    1000,0.00321,
    2000,0.00609,
    4000,0.01186,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_desc[]=
    {
    1000, 2.96147,
    2000,11.76799,
    4000,45.58834,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_rand[]=
    {
    1000, 3.205750,
    2000,12.181855,
    4000,47.445111,
       0,0.000
    };
//---------------------------------------------------------------------------
double div(double a,double b){ return (fabs(b)>1e-10)?a/b:0.0; }
//---------------------------------------------------------------------------
AnsiString get_complexity(double *dat)  // expect dat[] = { n0,t(n0), n1,t(n1), ... , 0,0 }
    {
    AnsiString O="O(?)";
    int i,e;
    double t,n,c,c0,c1,a,dc=1e+10;
    #define testbeg for (e=1,i=0;dat[i]>0.5;){ n=dat[i]; i++; t=dat[i]; i++;
    #define testend(s) if ((c<=0.0)||(n<2.0)) continue; if (e){ e=0; c0=c; c1=c; } if (c0>c) c0=c; if (c1<c) c1=c; } a=fabs(1.0-div(c0,c1)); if (dc>=a){ dc=a; O=s; }


    testbeg;            c=div(t,n);                 testend("O(n)");
    testbeg;            c=div(t,n*n);               testend("O(n^2)");
    testbeg;            c=div(t,n*n*n);             testend("O(n^3)");
    testbeg;            c=div(t,n*n*n*n);           testend("O(n^4)");
    testbeg; a=log(n);  c=div(t,a);                 testend("O(log(n))");
    testbeg; a=log(n);  c=div(t,a*a);               testend("O(log^2(n))");
    testbeg; a=log(n);  c=div(t,a*a*a);             testend("O(log^3(n))");
    testbeg; a=log(n);  c=div(t,a*a*a*a);           testend("O(log^4(n))");
    testbeg; a=log(n);  c=div(t,n*a);               testend("O(n.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*a);             testend("O(n^2.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a);           testend("O(n^3.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a);         testend("O(n^4.log(n))");
    testbeg; a=log(n);  c=div(t,n*a*a);             testend("O(n.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a);           testend("O(n^2.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a);         testend("O(n^3.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a);       testend("O(n^4.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a);           testend("O(n.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a);         testend("O(n^2.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a);       testend("O(n^3.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a);     testend("O(n^4.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a*a);         testend("O(n.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a*a);       testend("O(n^2.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a*a);     testend("O(n^3.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a*a);   testend("O(n^4.log^4(n))");

    #undef testend
    #undef testbeg
    return O+AnsiString().sprintf(" error = %.6lf",dc);
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
    {
    mm_log->Lines->Clear();
    mm_log->Lines->Add("factorial "+get_complexity(factorial));
    mm_log->Lines->Add("sort asc  "+get_complexity(sort_asc));
    mm_log->Lines->Add("sort desc "+get_complexity(sort_desc));
    mm_log->Lines->Add("sort rand "+get_complexity(sort_rand));
    }
//-------------------------------------------------------------------------

with relevant times measurements of mine fast exact bigint factorial where I simply used only the bigger times above 8ms, and also the sorting measurement from above which outputs this:

factorial O(n.log^2(n)) error = 0.665782
sort asc  O(n) error = 0.076324
sort desc O(n^2) error = 0.037886
sort rand O(n^2) error = 0.075000

The code just test few supported complexities and outputs the one that has lowest error (variation of c constant time between different n)...

Just ignore the VCL stuff and convert the AnsiString into any string or output you want ...

这篇关于给定运行时数据,如何知道排序程序是使用冒泡排序还是插入排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆