500万条记录排序txt文件(不使用任何tecnique)? [英] 5 million records sort txt file (not using any tecnique)?

查看:97
本文介绍了500万条记录排序txt文件(不使用任何tecnique)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含500万条记录的文件,其中包含数字

它们不按顺序排列(不规则)

您可以在下面找到文件结构:



I have a file with 5 million records which are include of numbers
that they are out of sequence (irregular)
you can find file structure below :

for instance          desired Result
------------          ---------------
 723,80                1,4
 14,50                 1,5
 723,2                 10,8
 1,5                   14,50
 10,8                  723,2
 1,4                   723,80







此结构显示不良状况和最佳状态我希望
达到最佳状态



最重要的(主要的)小费:

我没有使用任何技术,如linq,....

我想用可用的算法来安排文件。



此外(更多)应该考虑时间

所以,我们需要使用正确的算法将数字按顺序放入

一分钟内



谢谢




This structure displays bad condition and optimum condition and I
expect to reach the optimum

The most important (the main) tip :
I didn't use any techniques such as linq, ....
I want to do it with available algorithms and arrange the file.

furthermore (more over) the time should be considered
so, we need to use a proper algorithm to put the numbers in order
under a minute

Thanks

推荐答案

如果整个文件适合内存,我用_ code> File.ReadAllLines()读取所有内容。

使所有行都通过1并构建一个具有解析数字键。

然后使用 Array.Sort< TKey,TValue>(TKey []键,TValue []项目)进行排序。

(这将按照数组的排序顺序对两个数组进行排序。)

然后重写文件使用排序的数组。



如果整个事情适合内存,那么你可以阅读文件行 - 按行,提取并解析密钥到一个数组,同时构造一个并行的对象数组,其中包含文件中的起始字节位置和每个记录的字节长度。

执行 Array.Sort()如上所述然后写一个新的输出文件,通过寻找记录位置并将长度字节复制到输出文件来复制输入文件中的每个记录。





F.你的评论似乎你可以将整个内容拉入内存,因此修改评论中的代码。我对你的排序键的性质有点不清楚。从问题中的示例看,您有浮点键(以,作为小数点分隔符显示)。但是,在注释的代码中,您似乎只使用第一个逗号之前的整数作为排序键,但同样,原始示例暗示基于第二个数字的辅助排序。我将把它显示为按第一个整数列排序,忽略第二列:

If the whole file would fit in memory, I'd read it all in with File.ReadAllLines().
Make 1 pass through all of the lines and build a parallel array that has the parsed numeric keys.
Then use Array.Sort<TKey, TValue>(TKey[] keys, TValue[] items) to do the sorting.
(This will sort both arrays together by the sort order of the keys array.)
Then rewrite the file using the sorted array of the lines.

If the whole thing will not fit in memory, then you could read the file line-by-line, extract and parse the key into an array, and simultaneously construct a parallel array of objects with the start byte position in the file and the byte length of each record.
Do the Array.Sort() as above then write a new output file, copying each record from the input file by seeking to the record position and copying length bytes to the output file.

[edit: Matt T Heffron]
From your comment it looks like you can pull the whole thing into memory, so modifying the code in your comment. I'm a little unclear on the nature of your sort key. It appears from the example in the question that you have floating point keys (shown with the "," as the decimal separator). However in your comment's code, you appear to be using only the integer before the first comma as the sort key, but, again, the original example implies a secondary sort based on the second number. I'll show this as sorting by the first integer column case, ignoring the second column:
var lines = File.ReadAllLines(fileunordred);
int[] allCustomerIds = new int[lines.Length];  // make it the same length as lines
char[] splitter = new char[]{','};
for (int ix = 0; ix < ix.Length; ++ix)
{
  var splitLine = lines[ix].Split(splitter, 2);
  int customerId;
  if (!int.TryParse(splitLine[0], out customerId)
  {
    // error parsing the data, do something "sensible"
    allCustomerIds[ix] = -1;  // some value to indicate a "bad" row, to sort together
  }
  allCustomerIds[ix] = customerId;
}
Array.Sort(allCustomerIds, lines);
// the <int,int> is wrong 
// and <int,string> is unnecessary since the compiler can figure it out.



这两个数组现在按客户ID按数字升序排序。

只需使用 File.WriteAllLines(filename,lines)制作已排序的文件。





大致与上面的非常类似,但现在首先比较第一个整数,然后是第二个整数组中具有相同第一个整数的整数。


both arrays are now sorted in ascending numerical order by the customer id.
just use File.WriteAllLines("filename", lines) to make the sorted file.


Mostly very similar to the above, but now it compares first by the first integer and then by the second integer within the group that has the same first integer.

var lines = File.ReadAllLines(fileunordred);
int[] allInfo = new int[lines.Length];  // make it the same length as lines
char[] splitter = new char[]{','};
for (int ix = 0; ix < allInfo.Length; ++ix)
{
  var splitLine = lines[ix].Split(splitter);
  int[] pair= new int[2];
  allInfo[ix] = pair;
  int id;
  if (!int.TryParse(splitLine[0], out id))
  {
    // error parsing the data, do something "sensible"
    id = -1;  // some value to indicate a "bad" row, to sort together
  }
  pair[0] = id;
  if (!int.TryParse(splitLine[1], out id))
  {
    // error parsing the data, do something "sensible"
    id = -1;  // some value to indicate a "bad" row, to sort together
  }
  pair[1] = id;
}
Array.Sort(allInfo, (a, b) => {
  int comp = a[0].CompareTo(b[0]);
  return comp == 0 ? a[1].CompareTo(b[1]) : comp;
});
//Now just rewrite the file from the integers, (the lines array IS NOT sorted)
using (var out = new StreamWriter("outputfilename"))
{
  foreach (var pair in allInfo)
  {
    out.WriteLine("{0},{1}", pair[0], pair[1]);
  }
}



(我实际上没有试过这个,但它应该关闭......)


(I haven't actually tried this but it should be close...)


Quote:

我没有使用任何技术,如linq,....

我想要使用可用的算法并安排文件。

I didn't use any tecniques such as linq, ....
I want to do it with available algorithms and arrange the file.



最好的排序算法之一是 quicksort [ ^ ]。快乐编码。


500万条记录?你想用平面文件折磨自己来存储大量数据吗?



首先,我建议改变你存储数据的方式。但与此同时 - 使用OleDb。它应该是最简单,最快捷的数据排序方式。请参阅:

其他文本文件驱动程序编程详情 [ ^ ]

示例:

我过去的回答 [ ^ ]

如何:使用OleDb导入文本文件(tab,csv,custom) [ ^ ]
5 million records? Do you want to torture yourself using flat file to store large amount of data?

Firstly, i would suggest to change the way you store the data. But in a meanwhile - use OleDb. It should be the easiest and the quickest way to sort data. Please see:
Other Text File Driver Programming Details[^]
Examples:
My past answers[^]
How to: Use OleDb to import text files (tab, csv, custom)[^]


这篇关于500万条记录排序txt文件(不使用任何tecnique)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆