为什么按索引从IList中删除比从ISet中删除项目更糟糕? [英] Why is removing by index from an IList performing so much worse than removing by item from an ISet?
问题描述
修改:我将添加一些基准测试结果。对于列表中大约1000-5000个项目, IList
和 RemoveAt
beats ISet
和
删除
,但这不是什么担心,因为差异是边缘的。真正的乐趣开始时集合大小延伸到10000和更多。我只发布这些数据
虽然微不足道,但IList的优点。
第一个函数,随机项从单独的列表中获取,因此获得的项可能已从 ISet
中删除。这导致在第一个函数中 while
循环中的更多迭代。在第二个函数中,随机索引来自被迭代的源,因此不会有重复的迭代。 我已经测试过并验证了这一点。
i> j总是优势
我认为这种行为的原因是 List
需要在添加或删除项目时不断调整大小。但显然没有在其他一些测试。我跑了:
public static void Remove1(this ISet< int> set)
{
int count = set.Count;
for(int i = 0; i {
set.Remove(i + 1);
}
}
public static void Remove2(this IList< int> lst)
{
for(int i = lst.Count - 1; i = 0; i--)
{
lst.RemoveAt(i);
}
}
,发现第二个函数运行得更快。 p>
测试床:
var f = Enumerable.Range(1,100000);
var s = new HashSet< int>(f);
var l = new List< int>(f);
Benchmark(()=>
{
// some examples ...
s.RemoveRandomly1(2500);
l.RemoveRandomly2(2500);
s.Remove1();
l.Remove2();
},1);
public static void Benchmark(Action method,int iterations = 10000)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for(int i = 0; i< iterations; i ++)
method();
sw.Stop();
MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}
只是想知道这两个结构是什么。 p>
结果:
var f = Enumerable .Range(1,10000);
s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms
var f = Enumerable.Range(1,100000);
s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms
var f = Enumerable.Range(1,1000000);
s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms
对于大多数典型的需求, / p>
首先,IList和ISet不是任何东西的实现。我可以写一个IList或ISet实现,运行方式会有很大的不同,所以具体的实现是重要的(List和HashSet在你的case)。
通过以下方式访问列表项索引为O(1),但不是通过 <$删除c $ c> RemoveAt 这是O(n) 。
从末端列表删除将很快,因为它不需要复制任何东西,它只是减少其内部计数器存储多少项目,直到数量底层数组中的空白点低于阈值,在该点它将该数组复制到较小的一个。一旦你达到底层数组的最大容量,它创建一个新的数组大小的两倍,并复制元素。如果你低于某个阈值,它将创建一个大小的数组,并复制元素。
从列表中随机删除意味着它必须复制所有在索引之后的数组条目,以便它们向下滑动一个点,这本身很慢,特别是当列表的大小变大时。如果你有一个列表包含100万条记录,并且你删除了索引500,000的东西,它必须复制数组的下半部分。
Edit: I will add some benchmark results. To about a 1000 - 5000 items in the list, IList
and RemoveAt
beats ISet
and Remove
, but that's not something to worry about since the differences are marginal. The real fun begins when collection size extends to 10000 and more. I'm posting only those data
I was answering a question here last night and faced a bizarre situation.
First a set of simple methods:
static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
return rnd.Next(source.Count);
}
public static T GetRandom<T>(this IList<T> source)
{
return source[source.GetRandomIndex()];
}
------------------------------------------------------------------------------------------------------------------------------------
Let's say I'm removing N number of items from a collection randomly. I would write this function:
public static void RemoveRandomly1<T>(this ISet<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
var inList = source.ToList();
int i = 0;
while (source.Count > countToRemain)
{
source.Remove(inList.GetRandom());
i++;
}
}
or
public static void RemoveRandomly2<T>(this IList<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
int j = 0;
while (source.Count > countToRemain)
{
source.RemoveAt(source.GetRandomIndex());
j++;
}
}
As you can see the first function is written for an ISet
and the second for normal IList
. In the first function I'm removing by item from ISet
and by index in IList
, both of which I believe are O(1)
. Why is the second function performing so much worse than the first, especially when the lists get bigger?
Odds (my take):
1) In the first function the ISet
is converted to an IList
(to get the random item from the IList
), where as there is no such thing performed in the second function.
Advantage IList.
2) In the first function a call to GetRandomItem
is made, where as in the second, a call to GetRandomIndex
is made, that's one step less again.
Though trivial, advantage IList.
3) In the first function, the random item is got from a separate list, so the obtained item might be already removed from ISet
. This leads in more iterations in the while
loop in the first function. In the second function, the random index is got from the source that is being iterated on, hence there are never repetitive iterations. I have tested this and verified this.
i > j always, advantage
IList
.
I thought the reason for this behaviour is that a List
would need constant resizing when items are added or removed. But apparently no in some other testing. I ran:
public static void Remove1(this ISet<int> set)
{
int count = set.Count;
for (int i = 0; i < count; i++)
{
set.Remove(i + 1);
}
}
public static void Remove2(this IList<int> lst)
{
for (int i = lst.Count - 1; i >= 0; i--)
{
lst.RemoveAt(i);
}
}
and found that the second function runs faster.
Test bed:
var f = Enumerable.Range(1, 100000);
var s = new HashSet<int>(f);
var l = new List<int>(f);
Benchmark(() =>
{
//some examples...
s.RemoveRandomly1(2500);
l.RemoveRandomly2(2500);
s.Remove1();
l.Remove2();
}, 1);
public static void Benchmark(Action method, int iterations = 10000)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < iterations; i++)
method();
sw.Stop();
MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}
Just trying to know what's with the two structures.. Thanks..
Result:
var f = Enumerable.Range(1, 10000);
s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms
var f = Enumerable.Range(1, 100000);
s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms
var f = Enumerable.Range(1, 1000000);
s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms
For most typical needs a list would do though..!
First off, IList and ISet aren't implementations of anything. I can write an IList or an ISet implementation that will run very differently, so the concrete implementations are what is important (List and HashSet in your case).
Accessing a List item by index is O(1) but not removing by RemoveAt
which is O(n).
List removing from the end will be fast because it doesn't have to copy anything, it just decrements its internal counter that stores how many items it has until the number of empty spots in the underlying array goes below a threshold, at which point it will copy the array to a smaller one. Once you hit the max capacity of the underlying array it creates a new array double the size and copies the elements over. If you go below a certain threshold it will create an array half the size and copy the elements over. It tracks how large it is with a length property, so that unused slots appear like they aren't there.
Randomly removing from a list means that it will have to copy all the array entries that come after the index so that they slide down one spot, which is inherently pretty slow, particularly as the size of the list gets bigger. If you have a List with 1 million entries, and you remove something at index 500,000, it has to copy the second half of the array down a spot.
这篇关于为什么按索引从IList中删除比从ISet中删除项目更糟糕?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!