有没有办法加速这个VBA算法? [英] Is there any way I can speed up this VBA algorithm?

查看:116
本文介绍了有没有办法加速这个VBA算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻求实施VBA trie -building algorithm that能够在相当短的时间内(少于15-20秒)处理大量的英文词典(约50,000字)。由于我是一个C ++程序员,实践(这是我第一次做任何实质的VBA工作),我建立了一个快速的概念验证程序,可以在我的电脑上完成大约半秒钟的任务。当它来测试VBA端口的时候,花了差不多两分钟的时间来完成这个工作 - 对我来说这是一个不可接受的很长一段时间。 VBA代码如下:

I am looking to implement a VBA trie-building algorithm that is able to process a substantial English lexicon (~50,000 words) in a relatively short amount of time (less than 15-20 seconds). Since I am a C++ programmer by practice (and this is my first time doing any substantial VBA work), I built a quick proof-of-concept program that was able to complete the task on my computer in about half a second. When it came time to test the VBA port however, it took almost two minutes to do the same -- an unacceptably long amount of time for my purposes. The VBA code is below:

节点类模块:

Public letter As String
Public next_nodes As New Collection
Public is_word As Boolean

主模块:

Dim tree As Node

Sub build_trie()
    Set tree = New Node
    Dim file, a, b, c As Integer
    Dim current As Node
    Dim wordlist As Collection
    Set wordlist = New Collection
    file = FreeFile
    Open "C:\corncob_caps.txt" For Input As file
    Do While Not EOF(file)
        Dim line As String
        Line Input #file, line
        wordlist.add line
    Loop
    For a = 1 To wordlist.Count
        Set current = tree
        For b = 1 To Len(wordlist.Item(a))
            Dim match As Boolean
            match = False
            Dim char As String
            char = Mid(wordlist.Item(a), b, 1)
            For c = 1 To current.next_nodes.Count
                If char = current.next_nodes.Item(c).letter Then
                    Set current = current.next_nodes.Item(c)
                    match = True
                    Exit For
                End If
            Next c
            If Not match Then
                Dim new_node As Node
                Set new_node = New Node
                new_node.letter = char
                current.next_nodes.add new_node
                Set current = new_node
            End If
        Next b
        current.is_word = True
    Next a
End Sub

我的问题是简单的,这个算法可以加快吗?我从一些消息来看,VBA Collection s不像 Dictionary s那样有效,所以我尝试了一个词典的实现,但是花费相当的时间来完成更糟糕的内存使用(在我的电脑上使用Excel的500+ MB)。正如我所说,我对VBA非常新鲜,所以我对其语法以及其整体特征/限制的了解非常有限 - 这就是为什么我不认为这个算法效率可能会很高;任何提示/建议将不胜感激。

My question then is simply, can this algorithm be sped up? I saw from some sources that VBA Collections are not as efficient as Dictionarys and so I attempted a Dictionary-based implementation instead but it took an equal amount of time to complete with even worse memory usage (500+ MB of RAM used by Excel on my computer). As I say I am extremely new to VBA so my knowledge of both its syntax as well as its overall features/limitations is very limited -- which is why I don't believe that this algorithm is as efficient as it could possibly be; any tips/suggestions would be greatly appreciated.

提前感谢

注意: >该代码引用的词典文件corncob_caps.txt可以 here (下载所有CAPS文件)

NB: The lexicon file referred to by the code, "corncob_caps.txt", is available here (download the "all CAPS" file)

推荐答案

这里有一些小问题和几个较大的机会。你确实说这是你的第一个vba工作,所以原谅我,如果我告诉你你已经知道的事情

There are a number of small issues and a few larger opportunities here. You did say this is your first vba work, so forgive me if I'm telling you things you already know

小事情:

Dim file,a,b,c As Integer 将文件,a和b声明为变体。 整数是16位符号,所以可能会有溢出的风险,请改用 Long

Small things first:
Dim file, a, b, c As Integer declares file, a and b as variants. Integer is 16 bit sign, so there may be risk of overflows, use Long instead.

内部循环是有效的:与C ++不同,它们不是循环范围的。

DIM'ing inside loops is counter-productive: unlike C++ they are not loop scoped.

真正的机会是:

使用对于每个,您可以在其中迭代集合:其更快比索引。

Use For Each where you can to iterate collections: its faster than indexing.

在我的硬件上,你的原始代码运行在大约160s。这个代码在大约2.5s(两个加上时间加载word文件到集合,约4s)

On my hardware your original code ran in about 160s. This code in about 2.5s (both plus time to load word file into the collection, about 4s)

Sub build_trie()
    Dim t1 As Long
    Dim wd As Variant
    Dim nd As Node

    Set tree = New Node
    ' Dim file, a, b, c As Integer  : declares file, a, b as variant
    Dim file As Integer, a As Long, b As Long, c As Long     ' Integer is 16 bit signed

    Dim current As Node
    Dim wordlist As Collection
    Set wordlist = New Collection
    file = FreeFile
    Open "C:\corncob_caps.txt" For Input As file

    ' no point in doing inside loop, they are not scoped to the loop
    Dim line As String
    Dim match As Boolean
    Dim char As String
    Dim new_node As Node

    Do While Not EOF(file)
        'Dim line As String
        Line Input #file, line
        wordlist.Add line
    Loop


    t1 = GetTickCount
    For Each wd In wordlist ' for each is faster
    'For a = 1 To wordlist.Count
        Set current = tree
        For b = 1 To Len(wd)
            'Dim match As Boolean
            match = False
            'Dim char As String
            char = Mid$(wd, b, 1)
            For Each nd In current.next_nodes
            'For c = 1 To current.next_nodes.Count
                If char = nd.letter Then
                'If char = current.next_nodes.Item(c).letter Then
                    Set current = nd
                    'Set current = current.next_nodes.Item(c)
                    match = True
                    Exit For
                End If
            Next nd
            If Not match Then
                'Dim new_node As Node
                Set new_node = New Node
                new_node.letter = char
                current.next_nodes.Add new_node
                Set current = new_node
            End If
        Next b
        current.is_word = True
    Next wd

    Debug.Print "Time = " & GetTickCount - t1 & " ms"
End Sub

编辑:

将单词列表加载到动态数组将减少加载时间到次秒。请注意,Redim Preserve是昂贵的,所以在块中

loading the word list into a dynamic array will reduce load time to sub second. Be aware that Redim Preserve is expensive, so do it in chunks

    Dim i As Long, sz As Long
    sz = 10000
    Dim wordlist() As String
    ReDim wordlist(0 To sz)

    file = FreeFile
    Open "C:\corncob_caps.txt" For Input As file

    i = 0
    Do While Not EOF(file)
        'Dim line As String
        Line Input #file, line
        wordlist(i) = line
        i = i + 1
        If i > sz Then
            sz = sz + 10000
            ReDim Preserve wordlist(0 To sz)
        End If
        'wordlist.Add line
    Loop
    ReDim Preserve wordlist(0 To i - 1)

然后循环通过它像

    For i = 0 To UBound(wordlist)
        wd = wordlist(i)

这篇关于有没有办法加速这个VBA算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆