脚本之外的性能替代方法 [英] Performance alternative over Scripting.Dictionary
问题描述
我正在用几个按钮在Excel-VBA中对Manager进行编码.
I am coding a Manager in Excel-VBA with several buttons.
其中之一是使用另一个Excel文件(我称其为 T
)作为输入来生成标签.
One of them is to generate a tab using another Excel file (let me call it T
) as input.
T
的某些属性:
-
〜90MB
大小 -
〜350K
行 - 包含最近
14
个月内的销售数据(无序). - 相关列:
-
年/月
- 总钱
- 卖家名称
- 家庭用品
- 客户端名称
~90MB
size~350K
lines- Contains sales data of the last
14
months (unordered). - Relevant columns:
year/month
- total-money
- seller-name
- family-product
- client-name
主要关系:
- 卖家向许多客户出售许多产品
我正在使用卖方分组的最近
年/月
的T
中的数据生成一个新的Excel标签.I am generating a new Excel tab with data from
T
of the lastyear/month
grouped by Seller.重要说明:
-
T
是唯一可用的输入/源. - 如果两个或多个卖方将同一产品卖给同一客户,则
总金额
应计入所有这些卖方.
T
is the only available input/source.- If two or more Sellers sells the same Product to the same Client, the
total-money
should be counted to all of those Sellers.
这就足够了,现在您知道我已经已编码了.
This is enough, now you know what I have already coded.
我的代码可以正常运行,但是,它大约需要4分钟的运行时间.
My code works, but, it takes about 4 minutes of runtime.
我已经使用较小的代码源(不大于
2MB
)对其他一些按钮进行了编码,这些代码在5秒钟内即可运行.I have already coded some other buttons using smaller sources (not greater than
2MB
) which runs in 5 seconds.考虑到
T
的大小,可以接受4分钟的运行时间.Considering
T
size, 4 minutes runtime could be acceptable.但我对此并不感到骄傲,至少现在还没有.
But I'm not proud of it, at least not yet.
我的代码主要基于
Scripting.Dictionary
来映射T
中的数据,然后对obj中的每个键使用...下一个键
将分组的数据设置为新创建的标签.My code is mainly based on
Scripting.Dictionary
to map data fromT
, and then I usefor each key in obj ... next key
to set the grouped data to the new created tab.我不确定,但这是我的想法:
I'm not sure, but here are my thoughts:
- 如果
N
是Scripting.Dictionary
中的总键,那么我需要在汇总之前检查obj.Exists(str)
总金额
.它将运行N
字符串比较以返回false
. - 类似地,当我执行
Set Seller = obj(seller_name)
时,它将运行maximunN
个字符串.
- If
N
is the total keys in aScripting.Dictionary
, and I need to check forobj.Exists(str)
before aggregatingtotal-money
. It will runN
string compares to returnfalse
. - Similarly it will run maximun
N
string compares when I doSet seller = obj(seller_name)
.
我想错了我的想法.但是,如果我没记错的话,减少此功能运行时间的下一步(也是最后的希望)是使用
I want to be wrong with my thoughts. But if I'm not wrong, my next step (and last hope) to reduce the runtime of this function is to code my own class object with
Tries
.我明天才开始编码,我只是想以正确的方式进行确认,或者以错误的方式提供一些建议.
I will only start coding tomorrow, what I want is just some confirmation if I am in the right way, or some advices if I am in the wrong way of doing it.
您有什么建议吗?预先感谢.
Do you have any suggestions? Thanks in advance.
推荐答案
超出了内存限制
简而言之:
- 主要问题是因为我使用了一种动态编程方法来存储信息(预处理),从而缩短了执行时间.
- 我的代码现在可以在
〜13秒内运行
.
有些事情我们很难学习.但我很高兴找到答案.
There are things we learn the hard way. But I'm glad I found the answer.
- 使用任务管理器,我可以看到我的代码达到100%的内存使用率.
- 我上面提到的使用
Scripting.Dictionary
的DP方法的速度真正提高了100%. - 我上面提到的使用我自己的
cls_trie
实现的DP方法也达到了100%,但比第一个要晚. - 与以上尝试的总时间
〜2-3分钟
相比,这解释了〜4-5分钟
. - 在任务管理器中,我还可以看到CPU使用率从未达到
2%
.
- Using the Task Manager I was able to see my code reaching 100% memory usage.
- The DP approach I mentioned above using
Scripting.Dictionary
reached 100% really faster. - The DP approach I mentioned above using my own
cls_trie
implementation also reached 100%, but later than the first. - This explains the
~4-5 min
compared to~2-3 min
total runtime of above attempts. - In the Task Manager I could also see that the CPU usage never hited
2%
.
解决方案很简单,我必须平衡CPU和内存的使用.
Solution was simple, I had to balance CPU and Memory usages.
- 我将一些DP方法更改为带有
if-conditions
的简单for-loops
. - CPU使用率现在达到
〜15%
. - 现在内存使用率达到
〜65%
. - 我知道这与每台计算机的CPU和内存容量有关.但是,在客户端计算机中,它现在也只需要运行
15秒
.
- I changed some DP approaches to simple
for-loops
withif-conditions
. - The CPU usage now hits
~15%
. - The Memory usage now hits
~65%
. - I know this is relative to the CPU and Memory capacity of each machine. But in the client machine it is also running in no more than
15 seconds
now.
我使用cls_trie实现创建了一个 GitHub存储库,并添加了一个excel带有示例用法的文件.
I created one GitHub repository with my cls_trie implementation and added one excel file with an example usage.
我是excel-vba领域的新手(目前工作了4个月).可能有一些方法可以改善我的cls_trie实现,我欢迎您提出建议:
I'm new to the excel-vba world (4 months working with it right now). There might probably have some ways to improve my cls_trie implementation, I'm openned to suggestions:
Option Explicit Public Keys As Collection Public Children As Variant Public IsLeaf As Boolean Public tObject As Variant Public tValue As Variant Public Sub Init() Set Keys = New Collection ReDim Children(0 To 255) As cls_trie IsLeaf = False Set tObject = Nothing tValue = 0 End Sub Public Function GetNodeAt(index As Integer) As cls_trie Set GetNodeAt = Children(index) End Function Public Sub CreateNodeAt(index As Integer) Set Children(index) = New cls_trie Children(index).Init End Sub ''' 'Following function will retrieve node for a given key, 'creating a entire new branch if necessary ''' Public Function GetNode(ByRef key As Variant) As cls_trie Dim node As cls_trie Dim b() As Byte Dim i As Integer Dim pos As Integer b = CStr(key) Set node = Me For i = 0 To UBound(b) Step 2 pos = b(i) Mod 256 If (node.GetNodeAt(pos) Is Nothing) Then node.CreateNodeAt pos End If Set node = node.GetNodeAt(pos) Next If (node.IsLeaf) Then 'already existed Else node.IsLeaf = True Keys.Add key End If Set GetNode = node End Function ''' 'Following function will get the value for a given key 'Creating the key if necessary ''' Public Function GetValue(ByRef key As Variant) As Variant Dim node As cls_trie Set node = GetNode(key) GetValue = node.tValue End Function ''' 'Following sub will set a value to a given key 'Creating the key if necessary ''' Public Sub SetValue(ByRef key As Variant, value As Variant) Dim node As cls_trie Set node = GetNode(key) node.tValue = value End Sub ''' 'Following sub will sum up a value for a given key 'Creating the key if necessary ''' Public Sub SumValue(ByRef key As Variant, value As Variant) Dim node As cls_trie Set node = GetNode(key) node.tValue = node.tValue + value End Sub ''' 'Following function will validate if given key exists ''' Public Function Exists(ByRef key As Variant) As Boolean Dim node As cls_trie Dim b() As Byte Dim i As Integer b = CStr(key) Set node = Me For i = 0 To UBound(b) Step 2 Set node = node.GetNodeAt(b(i) Mod 256) If (node Is Nothing) Then Exists = False Exit Function End If Next Exists = node.IsLeaf End Function ''' 'Following function will get another Trie from given key 'Creating both key and trie if necessary ''' Public Function GetTrie(ByRef key As Variant) As cls_trie Dim node As cls_trie Set node = GetNode(key) If (node.tObject Is Nothing) Then Set node.tObject = New cls_trie node.tObject.Init End If Set GetTrie = node.tObject End Function
您可以在上面的代码中看到:
You can see in the above code:
- 我至今尚未实现任何删除方法,因为我不需要它.但这很容易实现.
- 我将自己限制为256个孩子,因为在此项目中,我正在处理的文本基本上是小写和大写的
[az]
字母和数字,以及两个文本映射到相同文本的可能性分支节点趋于零.
- I hadn't implemented any delete method because I didn't need it till now. But it would be easy to implement.
- I limited myself to 256 children because in this project the text I'm working on is basically lowercase and uppercase
[a-z]
letters and numbers, and the probability that two text get mapped to the same branch node tends zero.
正如一位出色的编码人员所说,每个人都喜欢自己的代码,即使别人的代码太漂亮而又不受欢迎.我的结论
- 即使事实证明它可能比我的
cls_trie
实现更好,我也可能永远不会再使用Scripting.Dictionary
.
- I will probably never more use
Scripting.Dictionary
, even if it is proven that somehow it could be better than mycls_trie
implementation.
谢谢大家的帮助.
这篇关于脚本之外的性能替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-