如何使这个排序程序在C语言中更快速地处理大型输入集 [英] How to make this sorting program in C much faster for the large input sets

查看:131
本文介绍了如何使这个排序程序在C语言中更快速地处理大型输入集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



  rewind(ptr)这个排序代码对于非常大的输入文件数据是失败的,因为它需要很长时间才能完成。 ); 
j = 0; ((fread(& temp,sizeof(temp),1,ptr)== 1)&&(j!= lines-1))//按对象读取对象
{
I = J + 1;
while(fread(& temp1,sizeof(temp),1,ptr)== 1)//读下一个对象,比较前一个对象与下一个对象
{
if(temp .key> temp1.key)//比较对象的键值
{
temp2 = temp; //如果你不想改变记录,只想改变键,使用三个语句temp2.​​key = temp.key;
temp = temp1;
temp1 = temp2;
fseek(ptr,j * sizeof(temp),0); //移动流以覆盖
fwrite(& temp,sizeof(temp),1,ptr); //你可以通过改变& temp到temp1来避免以上的交换。 //移动流以覆盖
fwrite(& temp1,sizeof(temp),1,ptr); //你可以通过改变& temp1到& temp
}
来避免以上交换。
}
j ++;
fseek(ptr,j * sizeof(temp),0);
}

有关如何使C代码更快的想法吗?也可以使用 qsort()(在C中预定义)快得多,应该如何应用到上面的代码?

解决方案

您提出的问题基于排序从一个文件的密钥,并给出了有关如何在内存中排序的各种答案。你添加了一个补充问题作为答案,然后创建这个问题,而不是(这是正确的)。

你的代码基本上是一个基于磁盘的冒泡排序,与O (N <2>)复杂性,以及时间性能差,因为它操纵文件缓冲区和磁盘。在最好的时候,冒泡排序是一个不错的选择 - 简单,是的,但是很慢。



加速排序程序的基本方法是: b
$ b


  1. 如果可能的话,将所有数据读入内存,在内存中进行排序,然后将结果写出

  2. 将不会全部放入内存中,尽可能多地读入内存,对其进行排序,然后将已排序的数据写入临时文件。根据需要重复排序所有数据。然后将临时文件合并到一个文件中。如果数据集真的是天文数字(或真正的小内存),则可能需要创建中间合并文件。然而,现在,您必须对数百GB进行排序,因此即使在32位计算机上也是如此。

  3. 确保您选择了一个好的排序算法。快速排序适当的枢轴选择是非常好的。你也可以查找'introsort'。

你可以在交叉引用的答案中找到内存中排序代码的例子问题(你原来的问题)。如果您选择编写自己的排序,则可以考虑是否将接口基于标准C qsort()函数。如果您编写快速排序,您应该查看快速排序 - 选择数据透视表这里的答案有很多的参考。

合并多个排序的文件到一个文件中。合并代码在合并模式下执行系统 sort 程序,这是非常有趣的,因为它不是高度精炼的代码(但是它是合理的工作方式)。



您可以查看软件中描述的外部排序程序工具,尽管它是用RatFor或Rational Fortran编写的,有点深奥。但是,这个设计很容易转移到其他语言。

This sort code fails for very large input file data because it takes too long for it to finish.

rewind(ptr);
j=0;
while(( fread(&temp,sizeof(temp),1,ptr)==1) &&( j!=lines-1)) //read object by object
{
  i=j+1;
  while(fread(&temp1,sizeof(temp),1,ptr)==1)  //read next object , to compare previous object with next object 
   {
       if(temp.key > temp1.key)   //compare key value of object 
           {
            temp2=temp; //if you don't want to change records and just want to change keys use three statements temp2.key =temp.key;
            temp=temp1;
            temp1=temp2;
            fseek(ptr,j*sizeof(temp),0);        //move stream to overwrite 
            fwrite(&temp,sizeof(temp),1,ptr);   //you can avoid above swap by changing &temp to &temp1 
            fseek(ptr,i*sizeof(temp),0);        //move stream to overwrite
            fwrite(&temp1,sizeof(temp),1,ptr);  //you can avoid above swap by changing &temp1 to &temp
           }
    i++; 
   }
  j++; 
  fseek(ptr,j*sizeof(temp),0);  
}

Any idea on how to make this C code much faster? Also would using qsort() (predefined in C) be much faster and how should be applied to the above code?

解决方案

You asked the question Sorting based on key from a file and were given various answers about how to sort in memory. You added a supplemental question as an answer, and then created this question instead (which was correct).

Your code here is basically a disk-based bubble sort, with O(N2) complexity, and poor time performance because it is manipulating file buffers and disk. A bubble sort is a bad choice at the best of times — simple, yes, but slow.

The basic ways to speed up sorting programs are:

  1. If possible, read all the data into memory, sort in memory, and write the result out.
  2. If it won't all fit into memory, read as much into memory as possible, sort it, and write the sorted data to a temporary file. Repeat as often as necessary to sort all the data. Then merge the temporary files into one file. If the data set is truly astronomical (or the memory truly minuscule), you may have to create intermediate merge files. These days, though, you have to be sorting many hundreds of gigabytes for that to be an issue at all, even on a 32-bit computer.
  3. Make sure you choose a good sorting algorithm. Quick sort with appropriate pivot selection is very good. You could look up 'introsort' too.

You'll find example in-memory sorting code in the answers to the cross-referenced question (your original question). If you choose to write your own sort, you can consider whether to base the interface on the standard C qsort() function. If you write a Quick Sort, you should look at Quicksort — Choosing the pivot where the answers have copious references.

You'll find example merging code in the answer to Merging multiple sorted files into one file. The merging code out-performs the system sort program in its merge mode, which is intriguing since it is not highly polished code (but it is reasonably workmanlike).

You could look at the external sort program described in Software Tools, though it is a bit esoteric in that it is written in 'RatFor' or Rational Fortran. The design, though, is readily transferrable to other languages.

这篇关于如何使这个排序程序在C语言中更快速地处理大型输入集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆