对数组C进行部分排序(Partially sorting an array C)

4 IT屋

I have an array which looks like this:

int array[] = {4.53, 3.65, 7.43, 9.54, 0.72, 0.0}

I am just wondering what method I can use to partially sort this array to bring the top three biggest doubles to the front. I am looking for the most efficient method to get the top three highest numbers in this array.

So far I have been using qsort, but I am just looking for another method to do this which could be even faster. I know that qsort is O(nlogn) for best cases and O(n^2) for worst cases, but is there an even more efficient method to achieve this problem? What I mean by efficient is just a faster way to do it, better than O(nlogn).

Any help would be great

解决方案

Simply maintain first, second, third.

   first =  array[0];
   second = array[1];
   third = array[2];

   /* scratch sort for three elements */
   if(first < second)
     swap(first, second);
  if(first < third)
     swap(first, third);
  if(second < third)
     swap(second, third);

  /* now go through, bubbling up if we have a hit */ 
  for(i=3;i<N;i++)
  {
      if(third < array[i])
      {
         third = array[i];
         if(second < third)
         {
            swap(second, third);
            if(first < second)
              swap(first, second);
         }
      }
  }     

I wouldn't try to scale up to k = four. I think three is about the limit for hardcoding it. As k get large you need to move to a formal method.

This doesn't answer the question you actually asked, which was how to partially sort, but it seems to be what you want.

If you wish to partially sort, you can use quicksort, and simply return early when the pivot goes above the bound you are interested it. So our first pivot divides into five, two. Ignore the last two, and only actually do the sub-sorts of the last five. But whilst it will be faster than quicksort, it won't be a game changer. If you can get a conservative upper bound on the k'th item (eg it's always going to be at most 25% between the minimum and the mean) you can quickly eliminate most of the data. If you get it wrong it's just another pass or two.

Using the quicksort method

  int sortfirstk_r(int *array, int N, int k)
  {
     int pivot = 0;
     int j = n -1;
     int i = 1;

     while(i <= j)
     {
        if(array[pivot] < array[i])
          swap(array[i], array[j--])
        else
          i++;

     }
     sortfirstk_r(array, i, k < i ? k : i);
     if(i < k)
       sortfirstk_r(array +i, N -i, k - i); 

  }

(Untested and there might be bugs in the slightly tricky sort logic).

However we've naively used the first element as the pivot. If we're sorting a large data set, and it has a normal distribution, and we want the top 1%, the z-score is 2.326. Take a bit more to allow us some sampling error, and we make a first pass with a pivot set at say 2.3 standard deviations above the mean. Then we split the distribution into two sets, the top 1% plus a bit, and the rest. We don't need to further process the rest, and just sort the top group.

我有一个看起来像这样的数组:



  int array [] = {4.53,3.65,7.43,9.54, 0.72,0.0} 


我只是想知道我可以使用什么方法对数组进行部分排序以使前三名最大的双打。我正在寻找最有效的方法来获取此数组中的前三个最高数字。



到目前为止,我一直在使用 qsort ,但我只是在寻找另一种方法来做到这一点甚至更快。我知道 qsort 最好的情况是 O(nlogn),而 O(n ^ 2 )在最坏的情况下,但是有没有更有效的方法来解决此问题?我所说的高效是一种更快的方法,比 O(nlogn)好。



任何帮助都很好


解决方案

只需保持第一,第二



  first = array [0]; 
second = array [1];
third = array [2];

/ *三个元素的临时排序* /
if(first< second)
swap(first,second);
if(first<第三)
swap(第一,第三);
if(second< third)
swap(第二,第三);

/ *现在经历了,如果我们命中* /
for(i = 3; i {
if(第三< array [i])
{
第三= array [i];
if(second< third)
{
swap(second,third);
if(first< second)
swap(第一,第二);
}
}
}


我不会尝试扩展到k = 4。我认为三个是对其进行硬编码的限制。随着k变大,您需要使用一种正式方法。



这并不能回答您实际提出的问题,即如何进行部分排序,但似乎



如果您希望部分排序,则可以使用quicksort,只需在枢轴超出界限时就早返回即可。因此,我们的第一个枢轴分为五个,两个。忽略最后两个,而实际上只做最后五个的子分类。但是,尽管它比quicksort更快,但它不会改变游戏规则。如果您可以在第k个项目上获得一个保守的上限(例如,最小值和均值之间的最大值始终为25%),则可以快速消除大部分数据。如果您弄错了,那只是另一回合。



使用快速排序方法



 < code> int sortfirstk_r(int * array,int N,int k)
{
intivot = 0;
int j = n -1;
int i = 1;

while(i< = j)
{
if(array [pivot]< array [i])
swap(array [i],array [j--])
else
i ++;

}
sortfirstk_r(array,i,k< i?k:i);
if(i< k)
sortfirstk_r(array + i,N -i,k-i);

}


(未经测试,可能会有一些棘手的错误逻辑)。



但是,我们天真的使用第一个元素作为枢轴。如果我们正在对大型数据集进行排序,并且它具有正态分布,并且我们希望排名靠前的1%,则z得分为2.326。多花一点点让我们有一些采样误差,然后我们进行一次遍历,将枢轴设置为比平均值高2.3个标准偏差。然后,我们将分布分为两组,顶部1%加一点,其余部分。我们不需要进一步处理其余的内容,只需对顶部的组进行排序。


本文地址:IT屋 » 对数组C进行部分排序