用VBA内存数组替换工作表数组公式 [英] Replace worksheet array formula with VBA memory array

查看:145
本文介绍了用VBA内存数组替换工作表数组公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  = INDEX('Client'!O $ 2:O $ 347473, 
MATCH(1,(('Client_Cost'!D $ 2:D $ 347473 ='Client'!BC2)*
('Client_Cost'!E $ 2:E $ 347473 ='Client'!BE2) 0))

这提供了两列匹配(Client_Cost!D:D to Client!BC2 AND Client_Cost!E:E到客户端!BE2)并从Client!O:O返回相应的值。



大量的行使数组公式非常计算密集。我可以处理几百行(500行约90秒),但我需要结果一直到客户端!BG347473,我希望他们今年的某个时间。



我已尝试使用



所以Scripting.Dictionary在这里是明显的赢家。不幸的是,当值更改时,不会自动计算各列中的更新,但在此开发阶段,工作表应设置为手动计算。将基于公式的解决方案之一从单个重新输入的值中重新计算的事件似乎是一个无效的时间支出。



所有这一切都是完美的。原始数组公式类似于在两个字段上具有INNER JOIN的SQL SELECT语句,如果我的SELECT语句运行效率低下,我将会改进的第一件事就是查看表的索引。


在相关说明中,具有这么多数据的任何工作簿都应该保存为Excel二进制工作簿,而不管是否启用了宏。二进制工作簿(.XLSB)的文件大小通常为等于XLSX或.XLSM大小的¹/3。除了更快的初始加载时间之外,许多批量操作应该证明更快。


任何想要测试自己的优化的人都可以找到我的示例.XLSB工作簿暂时 here 。不要盲目地运行程序,而不必先了解一下。






¹数组公式需要使用 Ctrl + Shift + 输入进行定稿。一旦正确输入第一个单元格,就可以像其他任何公式一样填充或复制。尝试并减少您的全列引用范围更加密切地表示实际数据的范围。数组公式对数计算循环计数,所以将参考范围缩小到最小值是一个很好的做法。请参阅有关更多信息的数组公式的指南和示例


My worksheet requires the following array formula in BG2.

=INDEX('Client'!O$2:O$347473,
       MATCH(1, (('Client_Cost'!D$2:D$347473='Client'!BC2)*
                 ('Client_Cost'!E$2:E$347473='Client'!BE2)), 0))

This provides a two-column match (Client_Cost!D:D to Client!BC2 AND Client_Cost!E:E to Client!BE2) and returns the corresponding value from Client!O:O.

The large number of rows makes the array formula very calculation-intensive. I can deal with a few hundred rows (~90 seconds for 500 rows) but I need results all the way down to Client'!BG347473 and I would like them sometime this year.

I've tried using Application Evaluate to return a result from the array formula into an variant array and subsequently returning the array of results to the worksheet en masse but it isn't the improvement I was hoping for. Looking for alternatives.

解决方案

First off, I would recommend developing alternative methods with a smaller data set. 5K or 10K rows will either demonstrate a noticeable improvement or not; you can always expand to the original data set once you are confident you are not going to get into a long 'Not responding' state that you have to crash out of.

A common method of removing the array from that style of array formula¹ is a 'helper' column that concatenates the two values from column D and E in the Client_Cost worksheet into a single, delimited value. For example, in Client_Cost!Z2 as,

=CONCATENATE(Client_Cost!D2, "|", Client_Cost!E2)

Filled down to Client_Cost!Z347473 should only take a second or two.

Once that is set up, a single INDEX/MATCH function pair can provide a vastly more efficient lookup on a similarly concatenated Client!BC2 and Client'!BE2. In Client!BG2 as,

=INDEX(Client!O$2:O$347473,
   MATCH(CONCATENATE(Client!BC2, "|", Client!BE2), 
         Client_Cost'!Z$2:Z$347473, 0))

That will take 1 hr, 51 minutes for 350K rows. While not yet optimal, it is a big improvement over the estimated ~17.5 hours that the original took.

The next logical step in optimizing that method would be working with a VBA Scripting.Dictionary object. A dictionary holds its own unique index on its keys and the concatenated values could be stuffed into a dictionary object to facilitate virtually instantaneous lookups on a large number of items (i.e. rows).

Sub JR_CSE_in_Array()
    Dim olr As Long, rws As Long, JR_Count As Long, JR_Values As Variant
    Dim v As Long, vTMP As Variant, vTMPs As Variant, dVALs As Object

    Debug.Print Timer
    Set dVALs = CreateObject("Scripting.Dictionary")

    'get some dimensions to the various data ranges
    With Worksheets("Client_Cost")
        'only use as many rows as absolutely necessary
        olr = Application.Min(.Cells(Rows.Count, "D").End(xlUp).Row, _
                              .Cells(Rows.Count, "E").End(xlUp).Row)
        'store D & E
        vTMPs = .Range(.Cells(2, 4), .Cells(olr, 5)).Value2

    End With
    With Worksheets("Client")
        rws = Application.Min(.Cells(Rows.Count, "BC").End(xlUp).Row, _
                              .Cells(Rows.Count, "BE").End(xlUp).Row, _
                              UBound(vTMPs, 1))
        'override the above statement for sampling
        'rws = 5000

        'building the Dictionary object takes a fair bit of time but it is worth it
        vTMP = .Range(.Cells(2, 15), .Cells(olr, 15)).Value2
        For v = LBound(vTMPs, 1) To UBound(vTMPs, 1)
            If Not dVALs.Exists(Join(Array(vTMPs(v, 1), vTMPs(v, 2)), ChrW(8203))) Then _
                dVALs.Add Key:=Join(Array(vTMPs(v, 1), vTMPs(v, 2)), ChrW(8203)), Item:=vTMP(v, 1)
        Next v

        'store BC and BE
        vTMPs = .Range(.Cells(2, 55), .Cells(olr, 57)).Value2
    End With

    ReDim JR_Values(1 To rws, 1 To 1)   'force a two-dimension, one-based index on the array
    'Debug.Print LBound(JR_Values) & ":" & UBound(JR_Values)

    For JR_Count = LBound(JR_Values, 1) To UBound(JR_Values, 1) Step 1
        If dVALs.Exists(Join(Array(vTMPs(JR_Count, 1), vTMPs(JR_Count, 3)), ChrW(8203))) Then
            JR_Values(JR_Count, 1) = dVALs.Item(Join(Array(vTMPs(JR_Count, 1), vTMPs(JR_Count, 3)), ChrW(8203)))
        End If
    Next JR_Count

    With Worksheets("Client")
        .Range("BG2").Resize(UBound(JR_Values), 1) = JR_Values
    End With

    'Debug.Print dVALs.Count
    dVALs.RemoveAll: Set dVALs = Nothing
    Debug.Print Timer
End Sub

The elapsed time for that routine to run (without helper column(s)) was 45.72 seconds. Breaking it down, it took a full 13.4 seconds just to build the dictionary and the remainder was largely taken up by the actual lookup with a half-second here and there attributed to the bulk seeding of the variant arrays from the worksheets' values.

         

So the Scripting.Dictionary is the clear winner here. Unfortunately, it is not automatically calculating updates in the various columns when values change but at this stage of development, the worksheet should be set to manual calculation. Setting one of the formula-based solutions into a recalculation event from a single retyped value seems an inefficient expenditure of time.

All-in-all, this makes perfect sense. The original array formula is analogous to an SQL SELECT statement with an INNER JOIN on two fields and if my SELECT statement was running inefficiently the first thing I would do to improve it would be to look at the tables' indexes.

On a related note, any workbook with this much data should be saved as a Excel Binary Workbook regardless of whether it is macro-enabled or not. The file size of a binary workbook (.XLSB) is typically ¹⁄₃ the size of an equivalent .XLSX or .XLSM. Beyond a faster initial load time, many bulk operations should prove faster.

Anyone wishing to test their own optimizations can find my sample .XLSB workbook here for the time being. Don't blindly run the procedures without seeing what you're getting into first.


¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.

这篇关于用VBA内存数组替换工作表数组公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆