Arrays 和 Arraylist 的 VBA 内存大小 [英] VBA memory size of Arrays and Arraylist
问题描述
我尝试加载 64 个字符的 6.000.000 (6 mio) 字符串,以便在 VBA 中对它们进行排序.我注意到的是:1.当我使用数组时,占用的内存约为 916 MB2. 当我使用 ArrayList 时,内存占用为 1.105 MB它们对我来说都不合理,因为字符串大小约为 380 MB.我做错了什么?由于字符串的数量会迅速增长,我很快就会面临内存不足"的问题.欢迎提出任何想法.
I have tried to load 6.000.000 (6 mio) strings of 64 characters in length in order to sort them in VBA. What I have noticed is : 1. When I use an Array the memory occupied is around 916 MB 2. When I use an ArrayList the memory occupied goes to 1.105 MB None of them is reasonable to me as the strings size is around 380 MB. What I doing wrong ? As the numbers of strings will grow rapidly I will face 'Out of memory' very soon. Any idea will be welcome.
Demeters
推荐答案
大部分问题是 VBA 本身使用 BSTRs,它们是 Unicode 字符串.我假设您对 ~380 mb 的计算是基于 600 万 * 64 个字符 @ 每个 1 字节.实际上,数学计算如下:
Most of the issue is the fact that VBA natively uses BSTRs, which are Unicode strings. I assume that your calculation of ~380 mb is based on 6 million * 64 characters @ 1 byte each. In actuality, the math works out to something like this:
VBA 字符串是 Unicode,在这种情况下意味着每个字符是 2字节.
VBA Strings are Unicode, which in this case means each character is 2 bytes.
VBA 中的一个 String 为 4 个字节,用于内部存储之前的长度字符串,字符串末尾的 unicode Null 的 2 个字节,以及每个字符 2 个字节.
A String in VBA is 4 bytes for internally storing the length before the string, 2 bytes for a unicode Null at the end of the string, and the 2 bytes per character.
结果是 4 + (64 * 2) + 2 = 每 64 个字符 134 个字节
字符串.
That works out to 4 + (64 * 2) + 2 = 134 bytes per 64 character
String.
String数组中的每一项实际上都是一个指向String的指针,
所以每个插槽还有 4 个字节,到目前为止总共 138 个.
Each entry in the String array is actually a pointer to the String,
so that's another 4 bytes per slot, 138 in total so far.
假设这些字符串中有 600 万个,那就是 828,000,000 字节(使用逗号美国风格),取决于您对 mb 的定义,是 789.6 或 828 mb.
Assuming 6 million of these Strings, that's 828,000,000 bytes (using commas US style) which, depending upon your definition of mb, is either 789.6 or 828 mb.
我不确定其余的开销,也许是垃圾收集器引用计数器?
I'm not sure about the rest of the overhead, perhaps garbage collector reference counters?
无论如何,我建议您使用 64 槽字节数组来加载和存储您的字符串,假设它是 ASCII 字符.您消除 (4 + 64 + 2) * 6,000,000 字节,您的代码可能会运行得更快,因为它不需要比较那么多字节.您可以通过一次比较一个字(32 位或 64 位,取决于您的处理器)来优化您的排序,而不仅仅是逐个字符.
Anyway, I would suggest that you use 64 slot Byte arrays to load and store your strings, assuming it's ASCII characters. You eliminate (4 + 64 + 2) * 6,000,000 bytes and your code will presumably run faster because it doesn't need to compare as many bytes. You could probably optimize your sort by comparing a Word (32 or 64 bits depending upon your processor) at a time instead of just character by character.
更新
我认为我的计算有点错误.字节数组是 SAFEARRAYs 本身有相当多的开销,大约 20 字节.因此节省的费用将接近 (4 + 64 + 2 - 20) * 6,000,000.
I think I was slightly wrong on that calculation. Byte Arrays are SAFEARRAYs which have quite a bit of overhead themselves, about 20 bytes. So the savings would be closer to (4 + 64 + 2 - 20) * 6,000,000.
在您查看此示例之前,请请接受我的建议并将您的文本导入 Access 以进行排序.Access' 限制 和 Access 可以(据我所知)对它们进行排序,而无需同时将所有字符串加载到内存中
Before you look at this example, please, please take my recommendation and import your text into Access to sort instead. 6 million strings for a total of 380 mb is well within Access' limits and Access can (as I understand it) sort them without resorting to loading all the strings into memory at the same time
使用以下文本创建一个名为data.txt"的文本文件:
Create a text file called "data.txt" with the following text:
This
Is
A
File
Of
Strings
To
Sort
在添加一个代码模块并将其命名为mdlQuickSort"并添加以下代码.我没有过多评论,但如果您对它的工作原理感到好奇,可以阅读 维基百科关于 QuickSort 的文章 或让我知道,我会添加更好的评论.
In add a code module and call it "mdlQuickSort" and add the following code. I haven't commented much, but if you're curious as to how it works you can read Wikipedia's article on QuickSort or let me know and I'll add better comments.
Option Explicit
Public Sub QuickSortInPlace(ByRef arrArray() As Variant)
If UBound(arrArray) <= 1 Then
Exit Sub
End If
qSort arrArray, 0, UBound(arrArray)
End Sub
Private Sub qSort(ByRef arrArray() As Variant, left As Long, right As Long)
Dim pivot As Long
Dim newPivotIndex As Long
If left < right Then
pivot = MedianOf3(arrArray, left, right)
newPivotIndex = partition(arrArray, left, right, pivot)
qSort arrArray, left, newPivotIndex - 1
qSort arrArray, newPivotIndex + 1, right
End If
End Sub
Private Function partition(ByRef arrArray() As Variant, left As Long, right As Long, pivot As Long) As Long
Dim pivotValue As Variant
pivotValue = arrArray(pivot)
Swap arrArray, pivot, right
Dim storeIndex As Long
storeIndex = left
Dim i As Long
For i = left To right - 1
If CompareFunc(arrArray(i), pivotValue) = -1 Then
Swap arrArray, i, storeIndex
storeIndex = storeIndex + 1
End If
Next
Swap arrArray, storeIndex, right
partition = storeIndex
End Function
Private Sub Swap(ByRef arrArray() As Variant, indexA As Long, indexB As Long)
Dim temp As Variant
temp = arrArray(indexA)
arrArray(indexA) = arrArray(indexB)
arrArray(indexB) = temp
End Sub
Private Function MedianOf3(ByRef arrArray() As Variant, left As Long, right As Long) As Long
Dim a As Variant, b As Variant, c As Variant
Dim indexA As Long, indexB As Long, indexC As Long
Dim ab As Long
Dim bc As Long
Dim ac As Long
indexA = left
indexB = (left + right) \ 2
indexC = right
a = arrArray(indexA)
b = arrArray(indexB)
c = arrArray(indexC)
ab = CompareFunc(a, b)
bc = CompareFunc(b, c)
ac = CompareFunc(a, c)
If ab = -1 Then
If ac = -1 Then
If bc = -1 Or bc = 0 Then
'a b c
'Already in B
Else
'a c b
Swap arrArray, indexB, indexC
End If
Else
'c a b
Swap arrArray, indexA, indexB
End If
Else
If bc = -1 Then
If ac = -1 Then
'b a c
Swap arrArray, indexA, indexB
Else
'b c a
Swap arrArray, indexB, indexC
End If
Else
'c b a
'Already in B
End If
End If
MedianOf3 = indexB
End Function
Private Function CompareFunc(str_a As Variant, str_b As Variant) As Long
Dim a As Byte
Dim b As Byte
Dim i As Long
For i = 0 To 63
a = str_a(i)
b = str_b(i)
If a <> b Then
Exit For
End If
Next
If i <= 63 Then
If a < b Then
CompareFunc = -1
Else
CompareFunc = 1
End If
Else
CompareFunc = 0
End If
End Function
最后,添加一个名为mdlMain"的模块.这是加载字符串的地方.代码如下:
Finally, add a module called "mdlMain". This is where the Strings are loaded. Here is the code:
Option Explicit
Public Sub Main()
Dim arrStrings() As Variant
Dim i As Long
'Get the strings from the file
FillArrStringsInPlace arrStrings
'Print the unsorted list
Debug.Print "Unsorted Strings" & vbCrLf & "---------------------"
For i = 0 To UBound(arrStrings)
Debug.Print StrConv(arrStrings(i), vbUnicode)
Next
'Sort in place
QuickSortInPlace arrStrings
'Print the sorted list
Debug.Print vbCrLf & vbCrLf & "Sorted Strings" & vbCrLf & "---------------------"
For i = 0 To UBound(arrStrings)
Debug.Print StrConv(arrStrings(i), vbUnicode)
Next
End Sub
Public Sub FillArrStringsInPlace(ByRef arr() As Variant)
Dim iFile As Integer
Dim strInput As String
Dim lineCount As Long
Dim arrBytes() As Byte
'Open a file in the same folder as this Access db called "data.txt"
iFile = FreeFile
Open ActiveWorkbook.Path & "\data.txt" For Input As iFile
'Since I already know how many strings there are, I'm assigning it here.
'The alternatives would be to either "dynamically resize" the array, which
'is equivalent to copying the entire thing everytime you add a new string,
'Or to count the number of newlines in the file and dimensioning the array
'to that size before reading in the strings line by line. Neither is as
'efficient as just defining it before-hand.
ReDim arr(0 To 7)
While Not EOF(iFile)
Line Input #iFile, strInput
arrBytes = StrConv(strInput, vbFromUnicode)
ReDim Preserve arrBytes(0 To 63)
arr(lineCount) = arrBytes
lineCount = lineCount + 1
Wend
Close iFile
End Sub
我已经在里面放了一些代码来尝试使用 CopyMemory 优化一些东西,但它有点危险,所以我决定不使用它.
I had put some code in there to try and optimize things with CopyMemory, but it was a tad dangerous, so I decided to leave it out.
这篇关于Arrays 和 Arraylist 的 VBA 内存大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!