数组和ArrayList的VBA内存大小 [英] VBA memory size of Arrays and Arraylist

查看:628
本文介绍了数组和ArrayList的VBA内存大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图加载的长度为64个字符6.000.000(6 MIO)字符串,才能在VBA对它们进行排序。
我注意到的是:
1.当我使用一个数组占用的内存大约是916 MB
2.当我使用ArrayList占用的内存变为1.105 MB
他们都不是合理的,我作为字符串大小约为380 MB。
我做错了吗?
作为字符串的数量将迅速增长,我将面临'内存不足。'很快。
任何想法将受到欢迎。

I have tried to load 6.000.000 (6 mio) strings of 64 characters in length in order to sort them in VBA. What I have noticed is : 1. When I use an Array the memory occupied is around 916 MB 2. When I use an ArrayList the memory occupied goes to 1.105 MB None of them is reasonable to me as the strings size is around 380 MB. What I doing wrong ? As the numbers of strings will grow rapidly I will face 'Out of memory' very soon. Any idea will be welcome.

Demetres

推荐答案

大部分的问题是,VBA本身使用的 BSTRs 的,这是统一code字符串。我假设你的〜380 MB的计算是基于600万* 64个字符@ 1个字节。实际上,数学作品出来是这样的:

Most of the issue is the fact that VBA natively uses BSTRs, which are Unicode strings. I assume that your calculation of ~380 mb is based on 6 million * 64 characters @ 1 byte each. In actuality, the math works out to something like this:


  • VBA字符串是统一code,在这种情况下是指每个字符为2
    字节。

  • VBA Strings are Unicode, which in this case means each character is 2 bytes.

在VBA的字符串为4个字节的内部之前存储的长度
串,2个字节用于在串的端部的单向code空,并且
每个字符的2个字节。

A String in VBA is 4 bytes for internally storing the length before the string, 2 bytes for a unicode Null at the end of the string, and the 2 bytes per character.

这工程以4 +(64 * 2)+ 2 = 134每64个字节的字符结果
字符串。

That works out to 4 + (64 * 2) + 2 = 134 bytes per 64 character
String.

字符串数组中的每个条目实际上是一个字符串指针,结果
所以这是每个插槽另一个4个字节,共138为止。

Each entry in the String array is actually a pointer to the String,
so that's another 4 bytes per slot, 138 in total so far.

假设600万这些字符串,这8.28亿个字节(用
逗号美国式)取决于你MB的定义,它的,要么是789.6或828 MB。

Assuming 6 million of these Strings, that's 828,000,000 bytes (using commas US style) which, depending upon your definition of mb, is either 789.6 or 828 mb.

我不知道有关的开销,或许垃圾收集引用计数的休息吗?

I'm not sure about the rest of the overhead, perhaps garbage collector reference counters?

无论如何,我会建议你使用64插槽字节数组来加载和存储您的字符串,假设它是ASCII字符。你消除(4 + 64 + 2)* 6000000字节,您的code将presumably运行得更快,因为它并不需要尽可能多的字节数进行比较。你可以通过角色在每次只字符而不是一个比较字(取决于您的处理器32或64位),可能优化排序。

Anyway, I would suggest that you use 64 slot Byte arrays to load and store your strings, assuming it's ASCII characters. You eliminate (4 + 64 + 2) * 6,000,000 bytes and your code will presumably run faster because it doesn't need to compare as many bytes. You could probably optimize your sort by comparing a Word (32 or 64 bits depending upon your processor) at a time instead of just character by character.

更新

我想我是错了小幅上计算。字节数组 SAFEARRAYS 这有相当多的开销自己,大约20个字节。因此,将节省接近。(4 + 64 + 2 - 20)* 6000000

I think I was slightly wrong on that calculation. Byte Arrays are SAFEARRAYs which have quite a bit of overhead themselves, about 20 bytes. So the savings would be closer to (4 + 64 + 2 - 20) * 6,000,000.

你看看这个例子之前,请把我的建议,并导入文本的访问,而不是进行排序。 600万串,总共380 MB是很好的访问限制并没有诉诸同时

Before you look at this example, please, please take my recommendation and import your text into Access to sort instead. 6 million strings for a total of 380 mb is well within Access' limits and Access can (as I understand it) sort them without resorting to loading all the strings into memory at the same time

创建具有以下文本称为data.txt中的文本文件:

Create a text file called "data.txt" with the following text:

This
Is
A
File
Of
Strings
To
Sort

在添加一个code模块,并称之为mdlQuickSort,然后添加以下code。我没有太多评论,但如果你好奇,它是如何运作的,你可以阅读快速排序维基百科的文章或让我知道,我会添加更好的意见。

In add a code module and call it "mdlQuickSort" and add the following code. I haven't commented much, but if you're curious as to how it works you can read Wikipedia's article on QuickSort or let me know and I'll add better comments.

Option Explicit

Public Sub QuickSortInPlace(ByRef arrArray() As Variant)
    If UBound(arrArray) <= 1 Then
        Exit Sub
    End If
    qSort arrArray, 0, UBound(arrArray)
End Sub

Private Sub qSort(ByRef arrArray() As Variant, left As Long, right As Long)
    Dim pivot As Long
    Dim newPivotIndex As Long
    If left < right Then
        pivot = MedianOf3(arrArray, left, right)
        newPivotIndex = partition(arrArray, left, right, pivot)
        qSort arrArray, left, newPivotIndex - 1
        qSort arrArray, newPivotIndex + 1, right
    End If
End Sub

Private Function partition(ByRef arrArray() As Variant, left As Long, right As Long, pivot As Long) As Long
    Dim pivotValue As Variant
    pivotValue = arrArray(pivot)
    Swap arrArray, pivot, right
    Dim storeIndex As Long
    storeIndex = left
    Dim i As Long
    For i = left To right - 1
        If CompareFunc(arrArray(i), pivotValue) = -1 Then
            Swap arrArray, i, storeIndex
            storeIndex = storeIndex + 1
        End If
    Next
    Swap arrArray, storeIndex, right
    partition = storeIndex
End Function

Private Sub Swap(ByRef arrArray() As Variant, indexA As Long, indexB As Long)
    Dim temp As Variant
    temp = arrArray(indexA)
    arrArray(indexA) = arrArray(indexB)
    arrArray(indexB) = temp
End Sub

Private Function MedianOf3(ByRef arrArray() As Variant, left As Long, right As Long) As Long
    Dim a As Variant, b As Variant, c As Variant
    Dim indexA As Long, indexB As Long, indexC As Long
    Dim ab As Long
    Dim bc As Long
    Dim ac As Long
    indexA = left
    indexB = (left + right) \ 2
    indexC = right
    a = arrArray(indexA)
    b = arrArray(indexB)
    c = arrArray(indexC)

    ab = CompareFunc(a, b)
    bc = CompareFunc(b, c)
    ac = CompareFunc(a, c)

    If ab = -1 Then
        If ac = -1 Then
            If bc = -1 Or bc = 0 Then
                'a b c
                'Already in B
            Else
                'a c b
                Swap arrArray, indexB, indexC
            End If
        Else
            'c a b
            Swap arrArray, indexA, indexB
        End If
    Else
        If bc = -1 Then
            If ac = -1 Then
                'b a c
                Swap arrArray, indexA, indexB
            Else
                'b c a
                Swap arrArray, indexB, indexC
            End If
        Else
            'c b a
            'Already in B
        End If
    End If
    MedianOf3 = indexB
End Function

Private Function CompareFunc(str_a As Variant, str_b As Variant) As Long
    Dim a As Byte
    Dim b As Byte
    Dim i As Long

    For i = 0 To 63
        a = str_a(i)
        b = str_b(i)
        If a <> b Then
            Exit For
        End If
    Next
    If i <= 63 Then
        If a < b Then
            CompareFunc = -1
        Else
            CompareFunc = 1
        End If
    Else
        CompareFunc = 0
    End If

End Function

最后,添加一个名为mdlMain模块。这是其中的字符串被加载。这里是code:

Finally, add a module called "mdlMain". This is where the Strings are loaded. Here is the code:

Option Explicit

Public Sub Main()
    Dim arrStrings() As Variant
    Dim i As Long

    'Get the strings from the file
    FillArrStringsInPlace arrStrings

    'Print the unsorted list
    Debug.Print "Unsorted Strings" & vbCrLf & "---------------------"
    For i = 0 To UBound(arrStrings)
        Debug.Print StrConv(arrStrings(i), vbUnicode)
    Next

    'Sort in place
    QuickSortInPlace arrStrings

    'Print the sorted list
    Debug.Print vbCrLf & vbCrLf & "Sorted Strings" & vbCrLf & "---------------------"
    For i = 0 To UBound(arrStrings)
        Debug.Print StrConv(arrStrings(i), vbUnicode)
    Next
End Sub

Public Sub FillArrStringsInPlace(ByRef arr() As Variant)
    Dim iFile As Integer
    Dim strInput As String
    Dim lineCount As Long
    Dim arrBytes() As Byte

    'Open a file in the same folder as this Access db called "data.txt"
    iFile = FreeFile
    Open ActiveWorkbook.Path & "\data.txt" For Input As iFile

    'Since I already know how many strings there are, I'm assigning it here.
    'The alternatives would be to either "dynamically resize" the array, which
    'is equivalent to copying the entire thing everytime you add a new string,
    'Or to count the number of newlines in the file and dimensioning the array
    'to that size before reading in the strings line by line.  Neither is as
    'efficient as just defining it before-hand.
    ReDim arr(0 To 7)

    While Not EOF(iFile)
        Line Input #iFile, strInput
        arrBytes = StrConv(strInput, vbFromUnicode)
        ReDim Preserve arrBytes(0 To 63)
        arr(lineCount) = arrBytes
        lineCount = lineCount + 1
    Wend

    Close iFile
End Sub

我已经把一些code在那里试着和CopyMemory的优化的事情,但它是一点点危险的,所以我决定离开它。

I had put some code in there to try and optimize things with CopyMemory, but it was a tad dangerous, so I decided to leave it out.

这篇关于数组和ArrayList的VBA内存大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆