为什么按索引从IList中删除比从ISet中删除项目更糟糕? [英] Why is removing by index from an IList performing so much worse than removing by item from an ISet?

查看:116
本文介绍了为什么按索引从IList中删除比从ISet中删除项目更糟糕?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

修改:我将添加一些基准测试结果。对于列表中大约1000-5000个项目, IList RemoveAt beats ISet 删除,但这不是什么担心,因为差异是边缘的。真正的乐趣开始时集合大小延伸到10000和更多。我只发布这些数据



我正在回答

首先是一个集合的简单方法:

  static Random rnd = new Random 
public static int GetRandomIndex< T>(this ICollection< T> source)
{
return rnd.Next(source.Count);
}

public static T GetRandom< T>(this IList< T> source)
{
return source [source.GetRandomIndex()];
}

------------- -------------------------------------------------- -------------------------------------------------- -------------------



假设我要移除N个项目收集随机。我会写这个函数:

  public static void RemoveRandomly1< T>(this ISet< T> source,int countToRemove)
{
int countToRemain = source.Count - countToRemove;
var inList = source.ToList();

int i = 0;
while(source.Count> countToRemain)
{
source.Remove(inList.GetRandom());
i ++;
}
}

  public static void RemoveRandomly2< T>(this IList< T> source,int countToRemove)
{
int countToRemain = source.Count - countToRemove;

int j = 0;
while(source.Count> countToRemain)
{
source.RemoveAt(source.GetRandomIndex());
j ++;
}
}

正如你可以看到的第一个函数是为 ISet ,第二个用于普通 IList 。在第一个函数中,我按项目 ISet IList ,我相信都是 O(1)



赔率(my take): p>

1)在第一个函数中, ISet 被转换为 IList (从 IList 中获取随机项),其中在第二个函数中没有执行这样的操作。



c> GetRandomItem
,其中,如在第二个,调用 GetRandomIndex ,这是一个步骤少了一次。


虽然微不足道,但IList的优点。


第一个函数,随机项从单独的列表中获取,因此获得的项可能已从 ISet 中删除​​。这导致在第一个函数中 while 循环中的更多迭代。在第二个函数中,随机索引来自被迭代的源,因此不会有重复的迭代。 我已经测试过并验证了这一点。


i> j总是优势


我认为这种行为的原因是 List 需要在添加或删除项目时不断调整大小。但显然没有在其他一些测试。我跑了:

  public static void Remove1(this ISet< int> set)
{
int count = set.Count;
for(int i = 0; i {
set.Remove(i + 1);
}
}

public static void Remove2(this IList< int> lst)
{
for(int i = lst.Count - 1; i = 0; i--)
{
lst.RemoveAt(i);
}
}

,发现第二个函数运行得更快。 p>

测试床

  var f = Enumerable.Range(1,100000); 

var s = new HashSet< int>(f);
var l = new List< int>(f);

Benchmark(()=>
{
// some examples ...

s.RemoveRandomly1(2500);
l.RemoveRandomly2(2500);

s.Remove1();
l.Remove2();

},1);

public static void Benchmark(Action method,int iterations = 10000)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for(int i = 0; i< iterations; i ++)
method();

sw.Stop();
MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}

只是想知道这两个结构是什么。 p>

结果:

  var f = Enumerable .Range(1,10000); 

s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms


var f = Enumerable.Range(1,100000);

s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms


var f = Enumerable.Range(1,1000000);

s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms

对于大多数典型的需求, / p>

解决方案

首先,IList和ISet不是任何东西的实现。我可以写一个IList或ISet实现,运行方式会有很大的不同,所以具体的实现是重要的(List和HashSet在你的case)。



通过以下方式访问列表项索引为O(1),但不是通过 <$删除c $ c> RemoveAt 这是O(n)



从末端列表删除将很快,因为它不需要复制任何东西,它只是减少其内部计数器存储多少项目,直到数量底层数组中的空白点低于阈值,在该点它将该数组复制到较小的一个。一旦你达到底层数组的最大容量,它创建一个新的数组大小的两倍,并复制元素。如果你低于某个阈值,它将创建一个大小的数组,并复制元素。



从列表中随机删除意味着它必须复制所有在索引之后的数组条目,以便它们向下滑动一个点,这本身很慢,特别是当列表的大小变大时。如果你有一个列表包含100万条记录,并且你删除了索引500,000的东西,它必须复制数组的下半部分。


Edit: I will add some benchmark results. To about a 1000 - 5000 items in the list, IList and RemoveAt beats ISet and Remove, but that's not something to worry about since the differences are marginal. The real fun begins when collection size extends to 10000 and more. I'm posting only those data

I was answering a question here last night and faced a bizarre situation.

First a set of simple methods:

static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
    return rnd.Next(source.Count);
}

public static T GetRandom<T>(this IList<T> source)
{
    return source[source.GetRandomIndex()];
}

------------------------------------------------------------------------------------------------------------------------------------

Let's say I'm removing N number of items from a collection randomly. I would write this function:

public static void RemoveRandomly1<T>(this ISet<T> source, int countToRemove)
{
    int countToRemain = source.Count - countToRemove; 
    var inList = source.ToList();

    int i = 0;
    while (source.Count > countToRemain)
    {
        source.Remove(inList.GetRandom()); 
        i++;
    }
}

or

public static void RemoveRandomly2<T>(this IList<T> source, int countToRemove)
{
    int countToRemain = source.Count - countToRemove;

    int j = 0;
    while (source.Count > countToRemain)
    {
        source.RemoveAt(source.GetRandomIndex()); 
        j++; 
    }
}

As you can see the first function is written for an ISet and the second for normal IList. In the first function I'm removing by item from ISet and by index in IList, both of which I believe are O(1). Why is the second function performing so much worse than the first, especially when the lists get bigger?

Odds (my take):

1) In the first function the ISet is converted to an IList (to get the random item from the IList), where as there is no such thing performed in the second function.

Advantage IList.

2) In the first function a call to GetRandomItem is made, where as in the second, a call to GetRandomIndex is made, that's one step less again.

Though trivial, advantage IList.

3) In the first function, the random item is got from a separate list, so the obtained item might be already removed from ISet. This leads in more iterations in the while loop in the first function. In the second function, the random index is got from the source that is being iterated on, hence there are never repetitive iterations. I have tested this and verified this.

i > j always, advantage IList.

I thought the reason for this behaviour is that a List would need constant resizing when items are added or removed. But apparently no in some other testing. I ran:

public static void Remove1(this ISet<int> set)
{
    int count = set.Count;
    for (int i = 0; i < count; i++)
    {
        set.Remove(i + 1);
    }
}

public static void Remove2(this IList<int> lst)
{
    for (int i = lst.Count - 1; i >= 0; i--)
    {
        lst.RemoveAt(i);
    }
}

and found that the second function runs faster.

Test bed:

var f = Enumerable.Range(1, 100000);

var s = new HashSet<int>(f);
var l = new List<int>(f);

Benchmark(() =>
{
    //some examples...

    s.RemoveRandomly1(2500);
    l.RemoveRandomly2(2500);

    s.Remove1();
    l.Remove2();

}, 1);

public static void Benchmark(Action method, int iterations = 10000)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < iterations; i++)
        method();

    sw.Stop();
    MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}

Just trying to know what's with the two structures.. Thanks..

Result:

var f = Enumerable.Range(1, 10000);

s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms


var f = Enumerable.Range(1, 100000);

s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms


var f = Enumerable.Range(1, 1000000);

s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms

For most typical needs a list would do though..!

解决方案

First off, IList and ISet aren't implementations of anything. I can write an IList or an ISet implementation that will run very differently, so the concrete implementations are what is important (List and HashSet in your case).

Accessing a List item by index is O(1) but not removing by RemoveAt which is O(n).

List removing from the end will be fast because it doesn't have to copy anything, it just decrements its internal counter that stores how many items it has until the number of empty spots in the underlying array goes below a threshold, at which point it will copy the array to a smaller one. Once you hit the max capacity of the underlying array it creates a new array double the size and copies the elements over. If you go below a certain threshold it will create an array half the size and copy the elements over. It tracks how large it is with a length property, so that unused slots appear like they aren't there.

Randomly removing from a list means that it will have to copy all the array entries that come after the index so that they slide down one spot, which is inherently pretty slow, particularly as the size of the list gets bigger. If you have a List with 1 million entries, and you remove something at index 500,000, it has to copy the second half of the array down a spot.

这篇关于为什么按索引从IList中删除比从ISet中删除项目更糟糕?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆