HashSet与List性能 [英] HashSet vs. List performance
问题描述
很清楚,通用 HashSet
类的搜索性能高于通用 List
class。只需将基于散列的键与 List< T>
类中的线性方法进行比较。
哈希键本身可能需要一些CPU周期,因此对于少量的项目,线性搜索可以是 HashSet< T>
的真实替代。
我的问题:盈亏平衡在哪里?
为了简化方案(公平),让我们假设 List
Equals()
方法来标识项。
<很多人都说,一旦你达到速度实际上是一个关注的大小
HashSet< T>
将总是跳过列表< T>
,但这取决于你在做什么。 你有一个列表< T>
,平均只有5个项目。在大量的循环中,如果每个循环中添加或删除单个项目,那么使用 List< T>
可能会更好。
我在我的机器上做了这个测试,并且,它必须非常非常小,以从 List< T>
。对于短字符串列表,在大小5之后,对于大小20之后的对象,优点消失。
1项LIST strs时间:617ms
1 item HASHSET strs time:1332ms
2 item LIST strs time:781ms
2 item HASHSET strs time:1354ms
3 item LIST strs time:950ms
3 item HASHSET strs time:1405ms
4 item LIST strs time:1126ms
4 item HASHSET strs time:1441ms
5项目列表strs时间:1370ms
5项HASHSET strs时间:1452ms
6项目列表strs时间:1481ms
6项HASHSET strs时间:1418ms
7项目列表strs时间:1581ms
7项HASHSET strs时间:1464ms
8项目列表strs时间:1726ms
8项HASHSET strs时间:1398ms
9 item LIST strs time:1901ms
9 item HASHSET strs time:1433ms
1 item LIST objs time:614ms
1 item HASHSET objs time:1993ms
4 item LIST objs time:837ms
4 item HASHSET objs time:1914ms
7 item LIST objs time:1070ms
7 item HASHSET objs time:1900ms
10 item LIST objs time:1267ms
10 item HASHSET objs time:1904ms
13 item LIST objs time:1494ms
13 item HASHSET objs time: 1893ms
16 item LIST objs time:1695ms
16 item HASHSET objs time:1879ms
19 item LIST objs time:1902ms
19 item HASHSET objs时间:1950ms
22 item LIST objs time:2136ms
22 item HASHSET objs time:1893ms
25 item LIST objs time:2357ms
25 item HASHSET objs time:1826ms
28 item LIST objs time:2555ms
28 item HASHSET objs time:1865ms
31 item LIST objs time:2755ms
31 item HASHSET objs time:1963ms
34 item LIST objs time:3025ms
34 item HASHSET objs time:1874ms
37 item LIST objs time:3195ms
37 item HASHSET objs time:1958ms
40 item LIST objs time:3401ms
40 item HASHSET objs time:1855ms
43 item LIST objs time:3618ms
43 item HASHSET objs time:1869ms
46 item LIST objs time:3883ms
46 item HASHSET objs time:2046ms
49 item LIST objs time :4218ms
49 item HASHSET objs time:1873ms
:
>
这是代码:
static void Main(string [] args)
{
int times = 10000000;
for(int listSize = 1; listSize <10; listSize ++)
{
List< string> list = new List< string>();
HashSet< string> hashset = new HashSet< string>();
for(int i = 0; i {
list.Add(string+ i.ToString());
hashset.Add(string+ i.ToString());
}
秒表timer = new Stopwatch();
timer.Start();
for(int i = 0; i< times; i ++)
{
list.Remove(string0);
list.Add(string0);
}
timer.Stop();
Console.WriteLine(listSize.ToString()+item LIST strs time:+ timer.ElapsedMilliseconds.ToString()+ms);
timer = new Stopwatch();
timer.Start();
for(int i = 0; i< times; i ++)
{
hashset.Remove(string0);
hashset.Add(string0);
}
timer.Stop();
Console.WriteLine(listSize.ToString()+item HASHSET strs time:+ timer.ElapsedMilliseconds.ToString()+ms);
Console.WriteLine()
}
for(int listSize = 1; listSize <50; listSize + = 3)
{
List< object> list = new List< object>();
HashSet< object> hashset = new HashSet< object>();
for(int i = 0; i {
list.Add(new object());
hashset.Add(new object());
}
object objToAddRem = list [0];
秒表timer = new Stopwatch();
timer.Start();
for(int i = 0; i< times; i ++)
{
list.Remove(objToAddRem);
list.Add(objToAddRem);
}
timer.Stop();
Console.WriteLine(listSize.ToString()+item LIST objs time:+ timer.ElapsedMilliseconds.ToString()+ms);
timer = new Stopwatch();
timer.Start();
for(int i = 0; i< times; i ++)
{
hashset.Remove(objToAddRem);
hashset.Add(objToAddRem);
}
timer.Stop();
Console.WriteLine(listSize.ToString()+item HASHSET objs time:+ timer.ElapsedMilliseconds.ToString()+ms);
Console.WriteLine();
}
Console.ReadLine();
}
It's clear that a search performance of the generic HashSet<T>
class is higher than of the generic List<T>
class. Just compare the hash-based key with the linear approach in the List<T>
class.
However calculating a hash key may itself take some CPU cycles, so for a small amount of items the linear search can be a real alternative to the HashSet<T>
.
My question: where is the break-even?
To simplify the scenario (and to be fair) let's assume that the List<T>
class uses the element's Equals()
method to identify an item.
A lot of people are saying that once you get to the size where speed is actually a concern that HashSet<T>
will always beat List<T>
, but that depends on what you are doing.
Let's say you have a List<T>
that will only ever have on average 5 items in it. Over a large number of cycles, if a single item is added or removed each cycle, you may well be better off using a List<T>
.
I did a test for this on my machine, and, well, it has to be very very small to get an advantage from List<T>
. For a list of short strings, the advantage went away after size 5, for objects after size 20.
1 item LIST strs time: 617ms
1 item HASHSET strs time: 1332ms
2 item LIST strs time: 781ms
2 item HASHSET strs time: 1354ms
3 item LIST strs time: 950ms
3 item HASHSET strs time: 1405ms
4 item LIST strs time: 1126ms
4 item HASHSET strs time: 1441ms
5 item LIST strs time: 1370ms
5 item HASHSET strs time: 1452ms
6 item LIST strs time: 1481ms
6 item HASHSET strs time: 1418ms
7 item LIST strs time: 1581ms
7 item HASHSET strs time: 1464ms
8 item LIST strs time: 1726ms
8 item HASHSET strs time: 1398ms
9 item LIST strs time: 1901ms
9 item HASHSET strs time: 1433ms
1 item LIST objs time: 614ms
1 item HASHSET objs time: 1993ms
4 item LIST objs time: 837ms
4 item HASHSET objs time: 1914ms
7 item LIST objs time: 1070ms
7 item HASHSET objs time: 1900ms
10 item LIST objs time: 1267ms
10 item HASHSET objs time: 1904ms
13 item LIST objs time: 1494ms
13 item HASHSET objs time: 1893ms
16 item LIST objs time: 1695ms
16 item HASHSET objs time: 1879ms
19 item LIST objs time: 1902ms
19 item HASHSET objs time: 1950ms
22 item LIST objs time: 2136ms
22 item HASHSET objs time: 1893ms
25 item LIST objs time: 2357ms
25 item HASHSET objs time: 1826ms
28 item LIST objs time: 2555ms
28 item HASHSET objs time: 1865ms
31 item LIST objs time: 2755ms
31 item HASHSET objs time: 1963ms
34 item LIST objs time: 3025ms
34 item HASHSET objs time: 1874ms
37 item LIST objs time: 3195ms
37 item HASHSET objs time: 1958ms
40 item LIST objs time: 3401ms
40 item HASHSET objs time: 1855ms
43 item LIST objs time: 3618ms
43 item HASHSET objs time: 1869ms
46 item LIST objs time: 3883ms
46 item HASHSET objs time: 2046ms
49 item LIST objs time: 4218ms
49 item HASHSET objs time: 1873ms
Here is that data displayed as a graph:
Here's the code:
static void Main(string[] args)
{
int times = 10000000;
for (int listSize = 1; listSize < 10; listSize++)
{
List<string> list = new List<string>();
HashSet<string> hashset = new HashSet<string>();
for (int i = 0; i < listSize; i++)
{
list.Add("string" + i.ToString());
hashset.Add("string" + i.ToString());
}
Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
list.Remove("string0");
list.Add("string0");
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item LIST strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
hashset.Remove("string0");
hashset.Add("string0");
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item HASHSET strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine();
}
for (int listSize = 1; listSize < 50; listSize+=3)
{
List<object> list = new List<object>();
HashSet<object> hashset = new HashSet<object>();
for (int i = 0; i < listSize; i++)
{
list.Add(new object());
hashset.Add(new object());
}
object objToAddRem = list[0];
Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
list.Remove(objToAddRem);
list.Add(objToAddRem);
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item LIST objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
hashset.Remove(objToAddRem);
hashset.Add(objToAddRem);
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item HASHSET objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine();
}
Console.ReadLine();
}
这篇关于HashSet与List性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!