什么是初始化集合时HashSet的做内存? [英] What does a hashset do with memory when initializing a collection?
问题描述
我偶然发现了以下问题。结果
我想与所有数字HashSet的1到100.000.000。
我尝试下面的代码:
I stumbled upon the following problem.
I want a hashset with all numbers from 1 to 100.000.000.
I tried the following code:
var mySet = new HashSet<int>();
for (var k = 1; k <= 100000000; k++)
mySet.Add(k);
这代码没有做它,因为我得到了内存溢出的地方周围的49mil。这也是相当缓慢和内存的过度增长。
That code didn't make it since I got memory overflow somewhere around the 49mil. This was also pretty slow and memory grew excessively.
然后我尝试这样做。
var mySet = Enumerable.Range(1, 100000000).ToHashSet();
其中ToHashSet()是下面的代码:
where ToHashSet() is the following code:
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source)
{
return new HashSet<T>(source);
}
我又得到了一个内存溢出,但我能够把更多的数字则与前面的代码
I got a memory overflow again but I was able to put in more numbers then with the previous code.
这不工作的事情是以下内容:
The thing that does work is the following:
var tempList = new List<int>();
for (var k = 1; k <= 100000000; k++)
tempList.Add(k);
var numbers = tempList.ToHashSet();
这需要我的系统只需填写tempList所在Enumerable.Range()只对大约800毫秒4蜱!
It takes about 800ms on my system to just fill the tempList where the Enumerable.Range() only takes 4 ticks!
我需要一个HashSet的,否则它会采取太多的时间来查找值(我需要它是O(1)),这将是巨大的如果我能做到这一点的最快方法
I do need that HashSet or else it would take to much time to lookup values (I need it to be O(1)) and it would be great if I could do that the fastest way.
现在我的问题是:结果
前两种方法为什么会导致内存溢出的地方第三个不?
Now my question is:
Why do the first two methods cause a memory overflow where the third doesn't?
是不是有什么特别的HashSet与初始化?
Is there something special HashSet does with the memory on initializing?
我的系统有16GB的内存,所以我的时候,我得到了溢出异常颇为惊讶。
My system has 16GB memory so i was quite surprised when I got the overflow exceptions.
推荐答案
像其他集合类型中,在添加元素所需的HashSet会自动增加其容量。在添加大量元素,这将导致大量的重新分配的。
Like other collection types, the HashSet will automatically increase its capacity as required as you add elements. When adding a large number of elements, this will result in a large number of reallocations.
如果你有一个构造函数初始化一个的IEnumerable< T>
,它会检查的IEnumerable< T>
实际上是一个的ICollection< T>
,如果是这样,初始化的HashSet的能力的大小。集合
If you initialize it with a constructor that takes an IEnumerable<T>
, it will check if the IEnumerable<T>
is in fact an ICollection<T>
, and if so, initialize the HashSet's capacity to the size of the collection.
这是正在发生的事情你是第三个例子 - 你要添加一个列表< T>
这也是一个的ICollection< T>
,所以你的HashSet给出的初始容量等于列表的大小,从而确保不需要重新分配
This is what's happening in you're third example - you're adding a List<T>
which is also an ICollection<T>
, so your HashSet is given an initial capacity equal to the size of the list, thus ensuring that no reallocations are needed.
您,如果您使用列表与LT更加高效; T>
构造函数能力的参数,因为这将避免重新分配构建列表时:
You will be even more efficient if you use the List<T>
constructor that takes a capacity parameter, as this will avoid reallocations when building the list:
var noElements = 100000000;
var tempList = new List<int>(noElements);
for (var k = 1; k <= noElements; k++)
tempList.Add(k);
var numbers = tempList.ToHashSet();
至于你的系统内存;检查,如果这是一个32位或64位的过程。 32位进程可以最大2GB的内存(3GB,如果你已经使用了/ 3GB启动开关)。
As for your system memory; check if this is a 32-bit or 64-bit process. A 32-bit process has a maximum of 2GB memory available (3GB if you've used the /3GB startup switch).
与其他集合类型(如列表< T>
,词典< TKEY的,TValue>
),的HashSet< T>
没有一个构造函数一个容量
参数设置初始容量。如果你想初始化一个的HashSet< T>
与大量元素,最有效的方式这样做可能是第一个元素添加到一个数组或列表< T>
通过适当的能力,然后通过这个数组或列表的的HashSet< T>
构造
Unlike other collection types (e.g. List<T>
, Dictionary<TKey,TValue>
), HashSet<T>
doesn't have a constructor that takes a capacity
parameter to set the initial capacity. If you want to initialize a HashSet<T>
with a large number of elements, the most efficient way to do so is probably to first add the elements to an array or List<T>
with the appropriate capacity, then pass this array or list to the HashSet<T>
constructor.
这篇关于什么是初始化集合时HashSet的做内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!