代码优化:数组与集合 [英] Code optimisation: Arrays vs collections

查看:70
本文介绍了代码优化:数组与集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

就内存消耗/执行时间而言,将项目添加到组中的更昂贵的方法是什么

Redim Preserve myArray(1 To Ubound(myArray) + 1)
myArray(Ubound(myArray)) = myVal

myCollection.Add myVal

最快的是依赖于myVal,还是随组的大小而变化?还有更快的方法吗?

如果有区别,我可以在类的声明部分中私下声明数组/集合,但是我想知道幕后发生的事情,并且哪种方法通常更快(不是在可读性或可维护性方面,而是在执行时间方面)

测试

确定,运行了一些测试,将许多1的实例添加到组和集合中,我的结果是:

  • 收集方法比变体数组快10倍
  • 收集方法比长数组快5倍
  • 收集方法比字节数组快1.5倍

使用此代码循环约5秒钟,结果全部显示

Sub testtime()
Dim sttime As Double
Dim endtime As Double
Dim i As Long
Dim u As Long
i = 0
ReDim a(1 To 1) 'or Set c = New Collection
sttime = Timer
endtime = Timer + 5
Do Until Timer > endtime
    u = UBound(a) + 1
    ReDim Preserve a(1 To u)
    a(u) = 1
    'or c.Add 1
    i = i + 1
Loop
endtime = Timer
Debug.Print (endtime - sttime) / i; "s", i; "iterations", Round(endtime - sttime, 3); "(ish) s"
End Sub

因此,看起来好像是在添加具有较大组的项目;添加到集合中的速度更快,但是我想知道为什么吗?

解决方案

ReDim Preserve歪斜了一切.

ReDim Preserve myArray(1 To UBound(myArray) + 1)

这本质上是低效率的,而且是不公平的比较. 每次添加项目时,您都在内部复制整个数组.我希望Collection比这更有效率.如果不是,请使用.NET的ArrayList,由于v2.0引入了泛型和List<T>,因此在.NET中已不推荐使用ArrayList,但是在VBA中可用-有用-(.NET泛型不能在VBA中使用).

ArrayList不会在每次添加项目时重新调整其内部_items数组的大小!注意评论:

 // Adds the given object to the end of this list. The size of the list is
// increased by one. If required, the capacity of the list is doubled
// before adding the new element.
//
public virtual int Add(Object value) {
    Contract.Ensures(Contract.Result<int>() >= 0);
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size] = value;
    _version++;
    return _size++;
}

...

// Ensures that the capacity of this list is at least the given minimum
// value. If the currect capacity of the list is less than min, the
// capacity is increased to twice the current capacity or to min,
// whichever is larger.
private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity: _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}
 

https://referencesource.microsoft.com/#mscorlib /system/collections/arraylist.cs

我不知道VBA.Collection的内部结构,但是如果我不得不猜测,我会说它可能具有类似的机制,可以避免在每次添加项目时重新定义内部数组的维数.但这仅是没有意义的,因为除Microsoft外,没人知道VBA.Collection是如何实现的.

我们可以做的是运行基准测试并进行比较-让我们将一百万个值添加到一个数组,一个集合以及一个ArrayList的heck中:

 Public Sub TestReDimArray()
    Dim sut() As Variant
    ReDim sut(0 To 0)
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While UBound(sut) < 1000000
        ReDim Preserve sut(0 To i)
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "ReDimArray added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestCollection()
    Dim sut As VBA.Collection
    Set sut = New VBA.Collection
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "Collection added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestArrayList()
    Dim sut As Object
    Set sut = CreateObject("System.Collections.ArrayList")
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "ArrayList added 1M items in " & Timer - t & " seconds."
End Sub
 

这是输出:

 ReDimArray added 1M items in 14.90234 seconds.
Collection added 1M items in 0.1875 seconds.
ArrayList added 1M items in 15.64453 seconds.
 

请注意,引用32位mscorlib.tlb并尽早绑定ArrayList并没有多大区别.另外,还有托管/COM互操作开销,并且VBA不支持构造函数,因此ArrayList的初始化容量为4,每次达到该容量时都会翻倍,即,当我们插入百万分之一的项目时,我们将内部尺寸调整为排列19次,最终内部容量为1,048,576项.

那么Collection到底怎么赢了?

因为数组被滥用:调整大小并不是最擅长的方法,而在每次插入都无法顺利进行之前调整大小.

何时使用数组?

在事先知道元素数量的情况下使用数组:

 Public Sub TestPopulateArray()
    Dim sut(0 To 999999) As Variant
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While i < 1000000
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "PopulateArray added 1M items in " & Timer - t & " seconds."
End Sub
 

输出:

 PopulateArray added 1M items in 0.0234375 seconds.
 

这比将相同数量的项添加到VBA.Collection快大约10倍-良好使用的数组非常快.


TL; DR

将数组的大小调整为最小-尽可能避免.如果您不知道最终要使用的项目数,请使用Collection.如果这样做,请使用显式设置的Array.

In terms of memory consumption/ execution time, what's a more expensive way of adding items to a group

Redim Preserve myArray(1 To Ubound(myArray) + 1)
myArray(Ubound(myArray)) = myVal

Or

myCollection.Add myVal

Does the fastest one depend on myVal, or vary with the size of the groups? Is there a faster method still?

I have the array/collection declared Privately in the declarations portion of a class if that makes a difference, but I'd like to know what is happening behind the scenes and which approach is generally faster (not in terms of readability or maintainability, just execution time)

Tests

OK, having run some tests adding many instances of 1 to groups and collections, my results are:

  • collection method 10 x faster than a variant array
  • collection method 5 x faster than a long array
  • collection method 1.5 x faster than a byte array

Results were all for approx 5 seconds of looping with this code:

Sub testtime()
Dim sttime As Double
Dim endtime As Double
Dim i As Long
Dim u As Long
i = 0
ReDim a(1 To 1) 'or Set c = New Collection
sttime = Timer
endtime = Timer + 5
Do Until Timer > endtime
    u = UBound(a) + 1
    ReDim Preserve a(1 To u)
    a(u) = 1
    'or c.Add 1
    i = i + 1
Loop
endtime = Timer
Debug.Print (endtime - sttime) / i; "s", i; "iterations", Round(endtime - sttime, 3); "(ish) s"
End Sub

So it looks like for adding that item, with relatively large groups; adding to a collection is faster, but I'd like to know why?

解决方案

ReDim Preserve is skewing it all.

ReDim Preserve myArray(1 To UBound(myArray) + 1)

That's inherently inefficient, and an unfair comparison. You're copying the entire array internally, every time you add an item. I would hope a Collection is much more efficient than that. If not, use .NET's ArrayList, which is deprecated in .NET since v2.0 introduced generics and List<T>, but usable - and useful - in VBA (.NET generics can't be used in VBA).

An ArrayList doesn't resize its internal _items array every single time an item is added! Notice the comments:

// Adds the given object to the end of this list. The size of the list is
// increased by one. If required, the capacity of the list is doubled
// before adding the new element.
//
public virtual int Add(Object value) {
    Contract.Ensures(Contract.Result<int>() >= 0);
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size] = value;
    _version++;
    return _size++;
}

...

// Ensures that the capacity of this list is at least the given minimum
// value. If the currect capacity of the list is less than min, the
// capacity is increased to twice the current capacity or to min,
// whichever is larger.
private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity: _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}

https://referencesource.microsoft.com/#mscorlib/system/collections/arraylist.cs

I don't know about the internals of a VBA.Collection, but if I had to guess, I would say it likely has a similar mechanism that avoids re-dimensioning the internal array every single time an item is added. But that's all moot, since nobody except Microsoft knows how VBA.Collection is implemented.

What we can do though, is run benchmarks and compare - let's add one million values to an array, a collection, and heck, an ArrayList:

Public Sub TestReDimArray()
    Dim sut() As Variant
    ReDim sut(0 To 0)
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While UBound(sut) < 1000000
        ReDim Preserve sut(0 To i)
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "ReDimArray added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestCollection()
    Dim sut As VBA.Collection
    Set sut = New VBA.Collection
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "Collection added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestArrayList()
    Dim sut As Object
    Set sut = CreateObject("System.Collections.ArrayList")
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "ArrayList added 1M items in " & Timer - t & " seconds."
End Sub

Here's the output:

ReDimArray added 1M items in 14.90234 seconds.
Collection added 1M items in 0.1875 seconds.
ArrayList added 1M items in 15.64453 seconds.

Note that referencing the 32-bit mscorlib.tlb and early-binding the ArrayList doesn't make much of a difference. Plus there's managed/COM interop overhead, and VBA doesn't support constructors, so the ArrayList initializes with a capacity of 4 that doubles every time capacity is reached, i.e. when we insert the millionth item we've resized the internal array 19 times and end up with an internal capacity of 1,048,576 items.

So how is Collection winning by that much anyway?

Because the array is being abused: resizing isn't what arrays do best, and resizing before every insert can't possibly go well.

When to use arrays?

Use arrays when you know the number of elements in advance:

Public Sub TestPopulateArray()
    Dim sut(0 To 999999) As Variant
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While i < 1000000
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "PopulateArray added 1M items in " & Timer - t & " seconds."
End Sub

Output:

PopulateArray added 1M items in 0.0234375 seconds.

That's roughly 10 times faster than adding the same number of items to a VBA.Collection - well-used arrays are blazingly fast.


TL;DR

Keep array resizing to a minimum - avoid it as much as possible. If you don't know the number of items you're going to end up with, use a Collection. If you do, use an explicitly sized Array.

这篇关于代码优化:数组与集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆