ArrayList和HashSet插入性能测试结果让我困惑 [英] ArrayList and HashSet insert performance test result confuse me
问题描述
我wirte一个类来测试arraylist和hashset之间的插入性能,正如我所料,哈希表插入性能将比arraylist(可能是书欺骗我),但测试结果让我这么困惑
i wirte a class to test the insert performance between arraylist and hashset,as i expect ,the hashset insert performance will be much better than arraylist(maybe the book deceived me),but the test result make me so confused
HashSet<String> hashSet = new HashSet<String>();
long start = System.currentTimeMillis();
for (int i = 0; i < 900000; i++) {
hashSet.add(String.valueOf(i));
}
System.out.println("Insert HashSet Time: " + (System.currentTimeMillis() - start));
ArrayList<String> arrayList = new ArrayList<String>();
start = System.currentTimeMillis();
for (int i = 0; i < 900000; i++) {
arrayList.add(String.valueOf(i));
}
System.out.println("Insert ArrayList Time: " + (System.currentTimeMillis() - start));
result:
Insert HashSet Time: 978
Insert ArrayList Time: 287
我运行这个主metod很多次,结果没有更多的不同之处,插入数组列表的时间比插入哈希表的时间短很多
任何人都可以解释这个奇怪的结果。
i run this main metod many times and the result has no more different between this,the insert arraylist time is much shorter than insert hashset time can anybody explain this weird result.
推荐答案
数据结构和算法的确切性能特征高度依赖于机器和实现。但是,对于我来说, ArrayList
插入将比通过常量因子插入 HashSet
快得多。要插入 ArrayList
,您只需要在数组中的特定索引处设置一个值。要插入到哈希集中,您需要为插入的项目计算哈希码,并将其映射到数组索引,检查索引并根据查找结果执行某些操作,最后插入数组。此外, HashSet
将具有更差的内存位置,因此您将更频繁地获取缓存缺失。
Exact performance characteristics of datastructures and algorithms are highly machine- and implementation-specific. However, it doesn't seem surprising to me that ArrayList
inserts would be faster than HashSet
inserts by a constant factor. To insert into an ArrayList
, you just need to set a value at a particular index in an array. To insert into a hash set, you need to compute a hashcode for the inserted item and map that to an array index, check that index and possibly perform some action based on what you find, and finally insert into the array. Furthermore the HashSet
will have worse memory locality so you'll get cache misses more often.
这两个数据结构都需要做,但是两个数据结构都需要以大约相同的速率调整大小(由于重新散列,哈希表调整大小也可能更加昂贵,因为常数因子)。
There's also the question of array resizing, which both data structures will need to do, but both data structures will need to resize at about the same rate (and hash table resizing is probably more expensive by a constant factor, too, due to rehashing).
两种算法都是恒定的(预期的)时间,但是有一个散列表比数组列表更多的东西。所以,它是不奇怪的,它会慢一个恒定的因素。 (再次,确切的差别在很大程度上取决于机器和实现。)
Both algorithms are constant (expected) time, but there's a lot more stuff to do with a hash table than an array list. So it's not surprising that it would be slower by a constant factor. (Again, the exact difference is highly dependent on machine and implementation.)
这篇关于ArrayList和HashSet插入性能测试结果让我困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!