排序20GB的数据 [英] Sorting 20GB of data

查看：168 发布时间：2017/4/24 23:48:17 delphi delphi-xe lazarus

本文介绍了排序20GB的数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在过去，我不得不使用大文件，在0.1-3GB的范围内。并不是所有的列都是需要的，所以将其余的数据装入RAM是可以的。
现在我必须使用1-20GB范围内的文件，并且随着时间的推移，它们可能会增长。这是完全不同的，因为您不能再将数据装入RAM。

In the past I had to work with big files, somewhere about in the 0.1-3GB range. Not all the 'columns' were needed so it was ok to fit the remaining data in RAM. Now I have to work with files in 1-20GB range, and they will probably grow as the time will pass. That is totally different because you cannot fit the data in RAM anymore.

我的文件包含数百万个条目（我发现一个具有30密耳条目）。在条目中包含大约10个列：一个字符串（50-1000个unicode字符）和几个数字。我必须通过列对数据进行排序并显示。对于用户来说，只有最高的条目（1-30％）是相关的，其余的是低质量的数据。

My file contains several millions of 'entries' (I have found one with 30 mil entries). On entry consists in about 10 'columns': one string (50-1000 unicode chars) and several numbers. I have to sort the data by 'column' and show it. For the user only the top entries (1-30%) are relevant, the rest is low quality data.

所以，我需要一些关于头朝哪个方向的建议出来我绝对不想将数据放在数据库中，因为它们难以为非计算机精通人员安装和配置。我喜欢提供一个单一的程序。

So, I need some suggestions about in which direction to head out. I definitively don't want to put data in a DB because they are hard to install and configure for non computer savvy persons. I like to deliver a monolithic program.

显示数据并不困难。但是排序...没有在RAM中加载数据，普通PC（2-6GB RAM）...将会杀死一些好时机。

Showing the data is not difficult at all. But sorting... without loading the data in RAM, on regular PCs (2-6GB RAM)... will kill some good hours.

我正在看MML（内存映射文件），但是丹尼·索普（Danny Thorpe）的这篇文章显示它可能不合适： http://dannythorpe.com/2004/03/19/the-hidden-costs-of-memory-mapped-files/

I was looking a bit into MMF (memory mapped files) but this article from Danny Thorpe shows that it may not be suitable: http://dannythorpe.com/2004/03/19/the-hidden-costs-of-memory-mapped-files/

所以，我正在考虑只加载列中需要排序的数据，并将其指向地址（到磁盘文件中））的入口。我排序'列'，然后我使用指针找到与每个列单元格对应的条目，并恢复条目。恢复将直接写入磁盘，因此不需要额外的RAM。

So, I was thinking about loading only the data from the column that has to be sorted in ram AND a pointer to the address (into the disk file) of the 'entry'. I sort the 'column' then I use the pointer to find the entry corresponding to each column cell and restore the entry. The 'restoration' will be written directly to disk so no additional RAM will be required.

PS：我正在寻找一种在拉撒路和德尔福工作的解决方案，因为Lazarus（实际上是FPC）对Mac有64位支持。 64位意味着更多的RAM可用=更快的排序。

PS: I am looking for a solution that will work both on Lazarus and Delphi because Lazarus (actually FPC) has 64 bit support for Mac. 64 bit means more RAM available = faster sorting.

排序20GB的数据 [英] Sorting 20GB of data

问题描述

推荐答案

相关文章

Delphi最新文章

热门教程

热门工具

登录关闭

排序20GB的数据 [英] Sorting 20GB of data

问题描述

推荐答案

相关文章

Delphi最新文章

热门教程

热门工具

登录 关闭

登录关闭