唯一编号生成 [英] Unique Number Generation

查看:120
本文介绍了唯一编号生成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好.
我第一次不能正确地提出这个问题.
我的问题是...在文本文件中是否有1亿多个数字的对照表中检查16位数字的最佳方法是什么?这些数字是由我无法控制的其他程序生成的.
一种常规方式是将数字加载到数组中,然后在循环中将每个数字与目标数字进行比较.但是,此过程非常耗时且占用内存.有什么更好的方法吗?
预先感谢.

Hello people.
I could not throw the question properly the first time.
My question is...what is the best way to check a 16 digit number against a list of 100 million+ numbers which are in a text file for duplicates? The numbers are generated by some other program which i cannot control.
One conventional way is to load the numbers in an array and compare each number with the target number in a loop. But this process is very time consuming and memory hungry. Is there any better way to do this?
Thanks in advance.

推荐答案

Pallab,

使用基于文件的合并排序将是您的最佳选择.
请阅读此博客条目:

http://splinter.com.au/blog/?p=142

这种合并排序通过将一个大文件分成较小的块来起作用.
按常规将它们分类并写入磁盘.然后在最后的
操作中,这些块将合并为一个大的排序文件.这样
您甚至可以将非常大的文件占用的内存空间保持得很小.
一次要保存在内存中的大部分信息就是大小
其中一个被分割成一块.
(不是很整洁吗?这种算法是从60/70年代开始的,当时主存储器是一件非常昂贵的事情.)

修改:

在我的原始答案中忘记了这一点.基于文件的合并排序完成后,您将获得一个排序的文件,其中可能包含一些重复项.因此,您将逐行打开该文件,并始终记住最后一个唯一条目.如果读取的下一行包含与最后一行相同的编号,则不会输出到最终文件.泡沫,冲洗,重复...
修改结束


干杯,


Manfred
Hi Pallab,

using file based merge sort would be the way to go for you.
Please read this blog entry:

http://splinter.com.au/blog/?p=142

This merge sort works by breaking one big file into smaller chunks.
These are conventionally sorted and written to disk. Then in a final
operation these chunks are merged into one big sorted file. This way
you can keep the memory footprint quite small even for very big files.
The larges piece of information to hold in memory at a time is the size
of one of the chunks the original was broken into.
(Pretty neat isn''t it: This algorithm is from way back in the 60/70''s when main memory was a very costly thing.)

Modification:

Forgot this in my original answer. After the file based merge sort is done you''ll have a sorted file with maybe some duplicates. So you open this file scan through it line by line always remembering the last unique entry. If the next line read contains the same number as the last one it is not output to the final file. Lather, rinse, repeat ...
End of Modification


Cheers,


Manfred


您可以将它们以10000个数字的组装入,进行比较,如果找不到,则装入下一个10000.唯一导致循环退出的事情是该数字是否已找到,或者是否已到达文件末尾.

您没有指定文件的确切格式,所以我假设每个数字都位于单独的行中.

您可以做的另一件事是找到大文件中存在的最大数字,并将其保存在单独的文件中.然后,您可以先检查新数字是否大于单号文件中的数字,如果是,则是黄金,可以将其添加到大文件中.如果没有,那么您可以遍历大文件以查看它是否已经在其中.

顺便说一句,对文件内容进行排序可能无济于事,因为您不能假设其他应用程序将以任何顺序添加它们,最终您将在进行搜索之前不断对文件进行排序.效率不高.


You could load them a in groups of 10000 numbers, compare, and if not found, load the next 10000. The only things that should cause the loop to exit would be if the number was found, or if the end of the file was reached.

You didn''t specify the exact format of the file, so I''m assuming that each number is on its own line.

Another thing you could do is find the largest number that exists in the big file, and keep it in a separate file. Then, you could first check to see if the new number is larger than the one in your single-number file, and if it is, you''re gold, and you can add it to the big file. If not, then you can run through the big file to see if it''s already in there.

BTW, sorting the contents of the file probably won''t help because you can''t assume that the other application will be adding them in any kind of order, and you''ll end up constantly sorting the file before conducting your search. Not very efficient.



对于初学者来说,我想得到一个认为将100+百万个数字存储在角落的文本文件中并踢他几下的好主意的家伙.时代.

从监狱出来后,我可能会将数字转移到一个简单的数据库中,以便可以利用数据库搜索算法.
For starters, I''d want to get the guy who thought it was good idea to store 100+ million numbers in a text file in a corner and kick him a few times.

After I got out of jail, I would probably transfer the numbers into a simple db so that I could take advantage of the databases search algorithms.


这篇关于唯一编号生成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆