代码优化:数组与集合 [英] Code optimisation: Arrays vs collections
问题描述
就内存消耗/执行时间而言,将项目添加到组中的更昂贵的方法是什么
Redim Preserve myArray(1 To Ubound(myArray) + 1)
myArray(Ubound(myArray)) = myVal
或
myCollection.Add myVal
最快的是依赖于myVal
,还是随组的大小而变化?还有更快的方法吗?
如果有区别,我可以在类的声明部分中私下声明数组/集合,但是我想知道幕后发生的事情,并且哪种方法通常更快(不是在可读性或可维护性方面,而是在执行时间方面)
测试
确定,运行了一些测试,将许多1的实例添加到组和集合中,我的结果是:
- 收集方法比变体数组快10倍
- 收集方法比长数组快5倍
- 收集方法比字节数组快1.5倍
使用此代码循环约5秒钟,结果全部显示
Sub testtime()
Dim sttime As Double
Dim endtime As Double
Dim i As Long
Dim u As Long
i = 0
ReDim a(1 To 1) 'or Set c = New Collection
sttime = Timer
endtime = Timer + 5
Do Until Timer > endtime
u = UBound(a) + 1
ReDim Preserve a(1 To u)
a(u) = 1
'or c.Add 1
i = i + 1
Loop
endtime = Timer
Debug.Print (endtime - sttime) / i; "s", i; "iterations", Round(endtime - sttime, 3); "(ish) s"
End Sub
因此,看起来好像是在添加具有较大组的项目;添加到集合中的速度更快,但是我想知道为什么吗?
ReDim Preserve歪斜了一切.
ReDim Preserve myArray(1 To UBound(myArray) + 1)
这本质上是低效率的,而且是不公平的比较. 每次添加项目时,您都在内部复制整个数组.我希望Collection
比这更有效率.如果不是,请使用.NET的ArrayList
,由于v2.0引入了泛型和List<T>
,因此在.NET中已不推荐使用ArrayList
,但是在VBA中可用-有用-(.NET泛型不能在VBA中使用).>
ArrayList
不会在每次添加项目时重新调整其内部_items
数组的大小!注意评论:
// Adds the given object to the end of this list. The size of the list is // increased by one. If required, the capacity of the list is doubled // before adding the new element. // public virtual int Add(Object value) { Contract.Ensures(Contract.Result<int>() >= 0); if (_size == _items.Length) EnsureCapacity(_size + 1); _items[_size] = value; _version++; return _size++; } ... // Ensures that the capacity of this list is at least the given minimum // value. If the currect capacity of the list is less than min, the // capacity is increased to twice the current capacity or to min, // whichever is larger. private void EnsureCapacity(int min) { if (_items.Length < min) { int newCapacity = _items.Length == 0? _defaultCapacity: _items.Length * 2; // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow. // Note that this check works even when _items.Length overflowed thanks to the (uint) cast if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength; if (newCapacity < min) newCapacity = min; Capacity = newCapacity; } }
https://referencesource.microsoft.com/#mscorlib /system/collections/arraylist.cs
我不知道VBA.Collection
的内部结构,但是如果我不得不猜测,我会说它可能具有类似的机制,可以避免在每次添加项目时重新定义内部数组的维数.但这仅是没有意义的,因为除Microsoft外,没人知道VBA.Collection
是如何实现的.
我们可以做的是运行基准测试并进行比较-让我们将一百万个值添加到一个数组,一个集合以及一个ArrayList
的heck中:
Public Sub TestReDimArray()
Dim sut() As Variant
ReDim sut(0 To 0)
Dim i As Long
Dim t As Single
t = Timer
Do While UBound(sut) < 1000000
ReDim Preserve sut(0 To i)
sut(i) = i
i = i + 1
Loop
Debug.Print "ReDimArray added 1M items in " & Timer - t & " seconds."
End Sub
Public Sub TestCollection()
Dim sut As VBA.Collection
Set sut = New VBA.Collection
Dim i As Long
Dim t As Single
t = Timer
Do While sut.Count < 1000000
sut.Add i
i = i + 1
Loop
Debug.Print "Collection added 1M items in " & Timer - t & " seconds."
End Sub
Public Sub TestArrayList()
Dim sut As Object
Set sut = CreateObject("System.Collections.ArrayList")
Dim i As Long
Dim t As Single
t = Timer
Do While sut.Count < 1000000
sut.Add i
i = i + 1
Loop
Debug.Print "ArrayList added 1M items in " & Timer - t & " seconds."
End Sub
这是输出:
ReDimArray added 1M items in 14.90234 seconds.
Collection added 1M items in 0.1875 seconds.
ArrayList added 1M items in 15.64453 seconds.
请注意,引用32位mscorlib.tlb
并尽早绑定ArrayList
并没有多大区别.另外,还有托管/COM互操作开销,并且VBA不支持构造函数,因此ArrayList
的初始化容量为4
,每次达到该容量时都会翻倍,即,当我们插入百万分之一的项目时,我们将内部尺寸调整为排列19次,最终内部容量为1,048,576项.
那么Collection
到底怎么赢了?
因为数组被滥用:调整大小并不是最擅长的方法,而在每次插入都无法顺利进行之前调整大小.
何时使用数组?
在事先知道元素数量的情况下使用数组:
Public Sub TestPopulateArray()
Dim sut(0 To 999999) As Variant
Dim i As Long
Dim t As Single
t = Timer
Do While i < 1000000
sut(i) = i
i = i + 1
Loop
Debug.Print "PopulateArray added 1M items in " & Timer - t & " seconds."
End Sub
输出:
PopulateArray added 1M items in 0.0234375 seconds.
这比将相同数量的项添加到VBA.Collection
快大约10倍-良好使用的数组非常快.
TL; DR
将数组的大小调整为最小-尽可能避免.如果您不知道最终要使用的项目数,请使用Collection
.如果这样做,请使用显式设置的Array
.
In terms of memory consumption/ execution time, what's a more expensive way of adding items to a group
Redim Preserve myArray(1 To Ubound(myArray) + 1)
myArray(Ubound(myArray)) = myVal
Or
myCollection.Add myVal
Does the fastest one depend on myVal
, or vary with the size of the groups? Is there a faster method still?
I have the array/collection declared Privately in the declarations portion of a class if that makes a difference, but I'd like to know what is happening behind the scenes and which approach is generally faster (not in terms of readability or maintainability, just execution time)
Tests
OK, having run some tests adding many instances of 1 to groups and collections, my results are:
- collection method 10 x faster than a variant array
- collection method 5 x faster than a long array
- collection method 1.5 x faster than a byte array
Results were all for approx 5 seconds of looping with this code:
Sub testtime()
Dim sttime As Double
Dim endtime As Double
Dim i As Long
Dim u As Long
i = 0
ReDim a(1 To 1) 'or Set c = New Collection
sttime = Timer
endtime = Timer + 5
Do Until Timer > endtime
u = UBound(a) + 1
ReDim Preserve a(1 To u)
a(u) = 1
'or c.Add 1
i = i + 1
Loop
endtime = Timer
Debug.Print (endtime - sttime) / i; "s", i; "iterations", Round(endtime - sttime, 3); "(ish) s"
End Sub
So it looks like for adding that item, with relatively large groups; adding to a collection is faster, but I'd like to know why?
ReDim Preserve is skewing it all.
ReDim Preserve myArray(1 To UBound(myArray) + 1)
That's inherently inefficient, and an unfair comparison. You're copying the entire array internally, every time you add an item. I would hope a Collection
is much more efficient than that. If not, use .NET's ArrayList
, which is deprecated in .NET since v2.0 introduced generics and List<T>
, but usable - and useful - in VBA (.NET generics can't be used in VBA).
An ArrayList
doesn't resize its internal _items
array every single time an item is added! Notice the comments:
// Adds the given object to the end of this list. The size of the list is // increased by one. If required, the capacity of the list is doubled // before adding the new element. // public virtual int Add(Object value) { Contract.Ensures(Contract.Result<int>() >= 0); if (_size == _items.Length) EnsureCapacity(_size + 1); _items[_size] = value; _version++; return _size++; } ... // Ensures that the capacity of this list is at least the given minimum // value. If the currect capacity of the list is less than min, the // capacity is increased to twice the current capacity or to min, // whichever is larger. private void EnsureCapacity(int min) { if (_items.Length < min) { int newCapacity = _items.Length == 0? _defaultCapacity: _items.Length * 2; // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow. // Note that this check works even when _items.Length overflowed thanks to the (uint) cast if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength; if (newCapacity < min) newCapacity = min; Capacity = newCapacity; } }
https://referencesource.microsoft.com/#mscorlib/system/collections/arraylist.cs
I don't know about the internals of a VBA.Collection
, but if I had to guess, I would say it likely has a similar mechanism that avoids re-dimensioning the internal array every single time an item is added. But that's all moot, since nobody except Microsoft knows how VBA.Collection
is implemented.
What we can do though, is run benchmarks and compare - let's add one million values to an array, a collection, and heck, an ArrayList
:
Public Sub TestReDimArray()
Dim sut() As Variant
ReDim sut(0 To 0)
Dim i As Long
Dim t As Single
t = Timer
Do While UBound(sut) < 1000000
ReDim Preserve sut(0 To i)
sut(i) = i
i = i + 1
Loop
Debug.Print "ReDimArray added 1M items in " & Timer - t & " seconds."
End Sub
Public Sub TestCollection()
Dim sut As VBA.Collection
Set sut = New VBA.Collection
Dim i As Long
Dim t As Single
t = Timer
Do While sut.Count < 1000000
sut.Add i
i = i + 1
Loop
Debug.Print "Collection added 1M items in " & Timer - t & " seconds."
End Sub
Public Sub TestArrayList()
Dim sut As Object
Set sut = CreateObject("System.Collections.ArrayList")
Dim i As Long
Dim t As Single
t = Timer
Do While sut.Count < 1000000
sut.Add i
i = i + 1
Loop
Debug.Print "ArrayList added 1M items in " & Timer - t & " seconds."
End Sub
Here's the output:
ReDimArray added 1M items in 14.90234 seconds.
Collection added 1M items in 0.1875 seconds.
ArrayList added 1M items in 15.64453 seconds.
Note that referencing the 32-bit mscorlib.tlb
and early-binding the ArrayList
doesn't make much of a difference. Plus there's managed/COM interop overhead, and VBA doesn't support constructors, so the ArrayList
initializes with a capacity of 4
that doubles every time capacity is reached, i.e. when we insert the millionth item we've resized the internal array 19 times and end up with an internal capacity of 1,048,576 items.
So how is Collection
winning by that much anyway?
Because the array is being abused: resizing isn't what arrays do best, and resizing before every insert can't possibly go well.
When to use arrays?
Use arrays when you know the number of elements in advance:
Public Sub TestPopulateArray()
Dim sut(0 To 999999) As Variant
Dim i As Long
Dim t As Single
t = Timer
Do While i < 1000000
sut(i) = i
i = i + 1
Loop
Debug.Print "PopulateArray added 1M items in " & Timer - t & " seconds."
End Sub
Output:
PopulateArray added 1M items in 0.0234375 seconds.
That's roughly 10 times faster than adding the same number of items to a VBA.Collection
- well-used arrays are blazingly fast.
TL;DR
Keep array resizing to a minimum - avoid it as much as possible. If you don't know the number of items you're going to end up with, use a Collection
. If you do, use an explicitly sized Array
.
这篇关于代码优化:数组与集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!