对数组C进行部分排序 [英] Partially sorting an array C

查看:130
本文介绍了对数组C进行部分排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数组:

I have an array which looks like this:

int array[] = {4.53, 3.65, 7.43, 9.54, 0.72, 0.0}

我只是想知道我可以使用什么方法对数组进行部分排序以使前三名最大的双打。我正在寻找最有效的方法来获取此数组中的前三个最高数字。

I am just wondering what method I can use to partially sort this array to bring the top three biggest doubles to the front. I am looking for the most efficient method to get the top three highest numbers in this array.

到目前为止,我一直在使用 qsort ,但我只是在寻找另一种方法来做到这一点甚至更快。我知道 qsort 最好的情况是 O(nlogn),而 O(n ^ 2 )在最坏的情况下,但是有没有更有效的方法来解决此问题?我所说的高效是一种更快的方法,比 O(nlogn)好。

So far I have been using qsort, but I am just looking for another method to do this which could be even faster. I know that qsort is O(nlogn) for best cases and O(n^2) for worst cases, but is there an even more efficient method to achieve this problem? What I mean by efficient is just a faster way to do it, better than O(nlogn).

任何帮助都很好

推荐答案

只需保持第一,第二

   first =  array[0];
   second = array[1];
   third = array[2];

   /* scratch sort for three elements */
   if(first < second)
     swap(first, second);
  if(first < third)
     swap(first, third);
  if(second < third)
     swap(second, third);

  /* now go through, bubbling up if we have a hit */ 
  for(i=3;i<N;i++)
  {
      if(third < array[i])
      {
         third = array[i];
         if(second < third)
         {
            swap(second, third);
            if(first < second)
              swap(first, second);
         }
      }
  }     

我不会尝试扩展到k = 4。我认为三个是对其进行硬编码的限制。随着k变大,您需要使用一种正式方法。

I wouldn't try to scale up to k = four. I think three is about the limit for hardcoding it. As k get large you need to move to a formal method.

这并不能回答您实际提出的问题,即如何进行部分排序,但似乎

This doesn't answer the question you actually asked, which was how to partially sort, but it seems to be what you want.

如果您希望部分排序,则可以使用quicksort,只需在枢轴超出界限时就早返回即可。因此,我们的第一个枢轴分为五个,两个。忽略最后两个,而实际上只做最后五个的子分类。但是,尽管它比quicksort更快,但它不会改变游戏规则。如果您可以在第k个项目上获得一个保守的上限(例如,最小值和均值之间的最大值始终为25%),则可以快速消除大部分数据。如果您弄错了,那只是另一回合。

If you wish to partially sort, you can use quicksort, and simply return early when the pivot goes above the bound you are interested it. So our first pivot divides into five, two. Ignore the last two, and only actually do the sub-sorts of the last five. But whilst it will be faster than quicksort, it won't be a game changer. If you can get a conservative upper bound on the k'th item (eg it's always going to be at most 25% between the minimum and the mean) you can quickly eliminate most of the data. If you get it wrong it's just another pass or two.

使用快速排序方法

  int sortfirstk_r(int *array, int N, int k)
  {
     int pivot = 0;
     int j = n -1;
     int i = 1;

     while(i <= j)
     {
        if(array[pivot] < array[i])
          swap(array[i], array[j--])
        else
          i++;

     }
     sortfirstk_r(array, i, k < i ? k : i);
     if(i < k)
       sortfirstk_r(array +i, N -i, k - i); 

  }

(未经测试,可能会有一些棘手的错误逻辑)。

(Untested and there might be bugs in the slightly tricky sort logic).

但是,我们天真的使用第一个元素作为枢轴。如果我们正在对大型数据集进行排序,并且它具有正态分布,并且我们希望排名靠前的1%,则z得分为2.326。多花一点点让我们有一些采样误差,然后我们进行一次遍历,将枢轴设置为比平均值高2.3个标准偏差。然后,我们将分布分为两组,顶部1%加一点,其余部分。我们不需要进一步处理其余的内容,只需对顶部的组进行排序。

However we've naively used the first element as the pivot. If we're sorting a large data set, and it has a normal distribution, and we want the top 1%, the z-score is 2.326. Take a bit more to allow us some sampling error, and we make a first pass with a pivot set at say 2.3 standard deviations above the mean. Then we split the distribution into two sets, the top 1% plus a bit, and the rest. We don't need to further process the rest, and just sort the top group.

这篇关于对数组C进行部分排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆