清单< T> .AddRange实现次优 [英] List<T>.AddRange implementation suboptimal

查看:625
本文介绍了清单< T> .AddRange实现次优的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的剖析C#应用程序表明显著时间以花名单,LT; T> .AddRange 。使用反射来看看代码在这个方法中表示,它调用列表< T>后者是这样实现的.InsertRange

 公共无效InsertRange(INT指数,IEnumerable的< T>收集)
{
如果(集合== NULL)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
如果(索引> this._size)
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index,ExceptionResource.ArgumentOutOfRange_Index);
}
ICollection的< T> IS2 =集合作为ICollection的< T> ;;
如果(IS2!= NULL)
{
诠释计数= is2.Count;
如果(计数大于0)
{
this.EnsureCapacity(this._size +计数);
如果(指数< this._size)
{
Array.Copy(this._items,指数,this._items,指数+计数,this._size - 指数);
}
如果(这种== IS2)
{
Array.Copy(this._items,0,this._items,指数,指数);
Array.Copy(this._items,(INT)(+指数计),this._items,(INT)(索引* 2),(INT)(this._size - 指数));
}
,否则
{
T []数组=新的T [计数] //(*)
is2.CopyTo(数组,0); //(*)
array.CopyTo(this._items,指数); //(*)
}
this._size + =计数;
}
} $ B使用$ b,否则
{
(IEnumerator的< T>枚举= collection.GetEnumerator())
{
,而(枚举.MoveNext())
{
this.Insert(指数++,enumerator.Current);
}
}
}
this._version ++;
}

私人T [] _​​items;



人们可以争辩说,接口(仅限有InsertRange的一个过载)的简单证明的性能开销运行时类型cheching和铸造。
,而这可能是3行我与指出背后的原因(*)
我认为这可能被改写到更快的替代方案:

  is2.CopyTo(this._items,指数); 



你看不到任何理由不使用这种简单和明显更快的替代方案?



编辑:



感谢您的答案。所以一致的意见是,这是针对在一个有缺陷的/恶意的方式实施CopyTo从输入集合的保护措施。对我来说,这似乎是一个矫枉过正不断支付1)运行时类型检查2)临时数组3)双复制操作,当所有这可能已保存通过定义2个或几个InsertRange的重载的动态分配的价格一让的IEnumerable 和现在一样,第二个得到一个列表< T> ,第三次获得 T [] 。 。后两种可能已落实到大约快两倍,在目前的情况下运行。



编辑2:



我没有实现一个类速查清单,相同的名单,但它也提供的AddRange的重载,这需要T []参数。这个过载不需要动态类型验证,和元素的双复制。我没有加入4个字节数组1000次到最初emtpy列表配置文件对List.AddRange这个FastList.AddRange。我击败实施与9倍标准List.AddRange的速度(九点!)。 List.AddRange发生在我们的应用的一个重要的使用方案之一的运行时的约5%,与类提供更快的AddRange可以通过4%提高应用程序运行时替换列表


解决方案

他们阻止的ICollection<的执行情况; T> 访问插入的边界之外的目的地列表的索引。上面的 IndexOutOfBoundsException异常如果有错误的(或操纵)的实施 CopyTo从被称为结果的实现。

请即 T []。CopyTo从是毫不夸张内部为的memcpy ,因此增加该行的性能开销分钟。当你添加安全呼叫的数量巨大的这样一个成本低,你不妨做这样



编辑:部分我觉得奇怪的是,在调用的ICollection< T> .CopyTo (复制到临时数组)不调用下立即发生的ensureCapacity 。如果被移动到该位置,然后按照列表中的任何同步异常会保持不变。原样,如果插入发生在列表的末尾该条件仅成立。这里的原因是:




  • 所有必要的配置改变列表元素之前发生

  • 的调用。因

    • 内存已分配 Array.Copy 不能不

    • 该界限已经检查

    • 源和目标阵列匹配

    • 的元素类型
    • 有使用像在C ++中没有拷贝构造函数 - 这是短短的memcpy


  • 可以抛出异常的唯一项目是外部调用 ICollection.CopyTo 和调整名单和分配临时数组所需的分配。如果移动元素的插入之前发生的所有这三个,要改变列表中的交易不能抛出一个同步异常

  • 最后提示:的这个地址严格例外行为 - 上述理由确实的的补充线程安全



修改2(回应。在OP的编辑):你有没有这个异形?您正在微软应该选择一个比较复杂的API,一些大胆的声明,所以你应该确保你在正确的说法,目前的方法是缓慢的。我从来没有与 InsertRange 的性能问题,我敢肯定,任何性能问题,有人做脸,它将与一个算法比重新设计可以更好地解决通过重新实现动态列表。只要你不把我当成是一种消极的方式恶劣,请记住以下几点:




  • I <击>别不想无法忍受人们对我的开发团队喜欢重新发明轮子广场

  • 我的绝对的希望这样关心潜在的性能问题的人在我的球队,并询问副作用他们的代码可能有问题。这一点胜出存在时 - 但只要人都在问的问题我将他们把他们的问题成固体的答案。如果你能告诉我,一个应用程序通过什么起初似乎是一个坏主意获得了一个显著的优势,那么这只是事情有时候要走的路。


Profiling my C# application indicated that significant time is spent in List<T>.AddRange. Using Reflector to look at the code in this method indicated that it calls List<T>.InsertRange which is implemented as such:

public void InsertRange(int index, IEnumerable<T> collection)
{
    if (collection == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
    }
    if (index > this._size)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index);
    }
    ICollection<T> is2 = collection as ICollection<T>;
    if (is2 != null)
    {
        int count = is2.Count;
        if (count > 0)
        {
            this.EnsureCapacity(this._size + count);
            if (index < this._size)
            {
                Array.Copy(this._items, index, this._items, index + count, this._size - index);
            }
            if (this == is2)
            {
                Array.Copy(this._items, 0, this._items, index, index);
                Array.Copy(this._items, (int) (index + count), this._items, (int) (index * 2), (int) (this._size - index));
            }
            else
            {
                T[] array = new T[count];          // (*)
                is2.CopyTo(array, 0);              // (*)
                array.CopyTo(this._items, index);  // (*)
            }
            this._size += count;
        }
    }
    else
    {
        using (IEnumerator<T> enumerator = collection.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                this.Insert(index++, enumerator.Current);
            }
        }
    }
    this._version++;
}

private T[] _items;

One can argue that the simplicity of the interface (only having one overload of InsertRange) justifies the performance overhead of runtime type cheching and casting. But what could be the reason behind the 3 lines I have indicated with (*) ? I think it could be rewritten to the faster alternative:

is2.CopyTo(this._items, index);

Do you see any reason for not using this simpler and apparently faster alternative?

Edit:

Thanks for the answers. So consensus opinion is that this is a protective measure against the input collection implementing the CopyTo in a defective/malicious manner. To me it seems like a overkill to constantly pay the price of 1) runtime type checking 2) dynamic allocation of the temporary array 3) double the copy operation, when all this could have been saved by defining 2 or a few more overloads of InsertRange, one getting IEnumerable as now, the second getting a List<T>, third getting T[]. The later two could have been implemented to run around twice as fast as in the current case.

Edit 2:

I did implement a class FastList, identical to List, except that it also provides an overload of AddRange which takes a T[] argument. This overload does not need the dynamic type verification, and double-copying of elements. I did profile this FastList.AddRange against List.AddRange by adding 4-byte arrays 1000 times to a list which was initially emtpy. My implementation beats the speed of standard List.AddRange with a factor of 9 (nine!). List.AddRange takes about 5% of runtime in one of the important usage scenarios of our application, replacing List with a class providing a faster AddRange could improve application runtime by 4%.

解决方案

They are preventing the implementation of ICollection<T> from accessing indices of the destination list outside the bounds of insertion. The implementation above results in an IndexOutOfBoundsException if a faulty (or "manipulative") implementation of CopyTo is called.

Keep in mind that T[].CopyTo is quite literally internally implemented as memcpy, so the performance overhead of adding that line is minute. When you have such a low cost of adding safety to a tremendous number of calls, you might as well do so.

Edit: The part I find strange is the fact that the call to ICollection<T>.CopyTo (copying to the temporary array) does not occur immediately following the call to EnsureCapacity. If it were moved to that location, then following any synchronous exception the list would remain unchanged. As-is, that condition only holds if the insertion happens at the end of the list. The reasoning here is:

  • All necessary allocation happens before altering the list elements.
  • The calls to Array.Copy cannot fail because
    • The memory is already allocated
    • The bounds are already checked
    • The element types of the source and destination arrays match
    • There is no "copy constructor" used like in C++ - it's just a memcpy
  • The only items that can throw an exception are the external call to ICollection.CopyTo and the allocations required for resizing the list and allocating the temporary array. If all three of these occur before moving elements for the insertion, the transaction to change the list cannot throw a synchronous exception.
  • Final note: This address strictly exceptional behavior - the above rationale does not add thread-safety.

Edit 2 (response to the OP's edit): Have you profiled this? You are making some bold claims that Microsoft should have chosen a more complicated API, so you should make sure you're correct in the assertions that the current method is slow. I've never had a problem with the performance of InsertRange, and I'm quite sure that any performance problems someone does face with it will be better resolved with an algorithm redesign than by reimplementing the dynamic list. Just so you don't take me as being harsh in a negative way, keep the following in mind:

  • I don't want can't stand people on my dev team that like to reinvent the square wheel.
  • I definitely want people on my team that care about potential performance issues, and ask questions about the side effects their code may have. This point wins out when present - but as long as people are asking questions I will drive them to turn their questions into solid answers. If you can show me that an application gains a significant advantage through what initially appears to be a bad idea, then that's just the way things go sometimes.

这篇关于清单&LT; T&GT; .AddRange实现次优的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆