在HashSet或任何其他列表中保留大量数据是否安全? [英] Is it safe to keep large volume of data in HashSet or any other list ?

查看:110
本文介绍了在HashSet或任何其他列表中保留大量数据是否安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个文件F1,F2和F3。每个包含数百万条记录。

这些文件包含数字。现在我的要求是检查任何文件中的任何重复项,例如编号N1在所有文件中应该是唯一的。我这样做是通过将所有记录放在一个hashset中并处理它。所以我的问题是我可以在Hashset中放置大约1亿条记录吗?会不会有任何内存问题?

我无法使用数据库,所以如果还有其他选择可以告诉我吗?

I have three files F1,F2 and F3 . Each containing millions of record in it .
These files contain numbers . Now my requirement is to check any duplicates in any of the file eg number N1 should be unique in all the files. I am doing this by putting all the record in a hashset and proccessing it . So my question is can i put around 100 million records in a Hashset ? Will there be any memory problem ?
I cant use a database so if there is any other option to do this please tell me ?

推荐答案

原则上你可以,但你会推动某些计算机的可用内存限制。如果你明智地组合你的哈希集,你就不需要那么多的内存。



而不是保留一些内容,哈希集的值中的文件记录,只保留文件中的位置(或者,记录的位置和大小)。我不知道你的钥匙类型是什么,钥匙需要多少内存。



那么你可能意味着类型 System.Collections.Hashtable 。对于任何新开发,您永远不应该使用此类型,以及任何其他非专用非泛型集合类型。早在.NET 2.0版本引入泛型时,它就已经过时了。它没有正式标记 [Obsolete 属性,因为在使用良好的遗留代码中维护它们没有任何问题。非泛型类型需要类型转换,因此可能比您真正需要使用的泛型类更危险。您应该选择以下三种中的一种:

http://msdn.microsoft。 com / en-us / library / xfhwa508.aspx [ ^ ],

http:// msdn。 microsoft.com/en-us/library/ms132259.aspx [ ^ ],

http:// msdn.microsoft.com/en-us/library/ms132319.aspx [ ^ ]。



所有这些密钥索引容器之间的主要区别是计算复杂度(实际操作时间)和内存开销。由于你的情况对内存开销最关键,你需要研究这个问题才能做出正确的选择。



我不知道你是否可以使用为您的目的,类 System.Collections.Generic.HashSet< T>



现在,剩下的问题是:如果您仍然需要保留比RAM中更多的数据,该怎么办?好吧,我当然会解决这个问题,但需要更多的工作。这个想法很简单:您可以了解关联容器如何工作并使用磁盘存储器实现主存储。请参阅:

http://en.wikipedia.org/wiki/Hash_table [< a href =http://en.wikipedia.org/wiki/Hash_tabletarget =_ blanktitle =New Window> ^ ]。



但是,我会在这里停下来。首先,我不太确定你的整个方法是否合理。对我来说,涉及大量内存消耗的所有解决方案都是可疑的。如果我知道你的确切目标,我可能会尝试检查整个架构。



-SA
In principle you can, but you will push the limits of the available RAM of some computers. You need not as much memory as you might thing if you compose your hash set wisely.

Instead of keeping some content, a file record in the values of the hash set, keep only the position in a file (or maybe, the position and the size of record). I don''t know what would be you key type and how much memory will the key take.

By the way, you probably meant the type System.Collections.Hashtable. For any new development, you never should use this type, as well as any other non-specialized non-generic collection types. It was rendered obsolete as early as of the .NET version 2.0, when generics were introduces. It wasn''t formally marked with the [Obsolete attribute only because there is nothing wrong in maintaining them in well-working legacy code. Non-generic types require type casts and hence potentially more dangerous than the generic classes you really need to use. You should pick one of these three:
http://msdn.microsoft.com/en-us/library/xfhwa508.aspx[^],
http://msdn.microsoft.com/en-us/library/ms132259.aspx[^],
http://msdn.microsoft.com/en-us/library/ms132319.aspx[^].

The major difference between all those key-indexed container is different overhead between computational complexity (time of operation, practically) and memory overhead. As you situation can be most critical to memory overhead, you will need to study this problem to make a right choice.

I don''t know if you can use the class System.Collections.Generic.HashSet<T> for your purpose.

Now, the remaining question is: what if you still need to keep more data than you can hold in your RAM? Well, I would certainly solve such problem, but it would need more work. The idea is simple: you can learn how associative containers work and implement it using disk memory for major storage. Please see:
http://en.wikipedia.org/wiki/Hash_table[^].

However, I would stop here. First and foremost, I''m not quite sure that your whole approach is reasonable. To me, all solution which involve huge memory consumption are suspicious. If I knew your exact goals, I would probably tried to review the whole architecture.

—SA


这篇关于在HashSet或任何其他列表中保留大量数据是否安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆