用 VBA 内存数组替换工作表数组公式 [英] Replace worksheet array formula with VBA memory array

查看:28
本文介绍了用 VBA 内存数组替换工作表数组公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的工作表需要 BG2 中的以下数组公式.

=INDEX('Client'!O$2:O$347473,匹配(1,(('Client_Cost'!D$2:D$347473='Client'!BC2)*('Client_Cost'!E$2:E$347473='Client'!BE2)), 0))

这提供了一个两列匹配(Client_Cost!D:D 到 Client!BC2 AND Client_Cost!E:E 到 Client!BE2)并从 Client!O:O 返回相应的值.

大量的行使得数组公式计算量很大.我可以处理几百行(500 行约 90 秒),但我需要一直到客户端的结果!BG347473,我希望今年某个时候得到它们.

我尝试使用

因此 Scripting.Dictionary 是明显的赢家.不幸的是,当值发生变化时,它不会自动计算各个列中的更新,但在开发的这个阶段,工作表应该设置为手动计算.将基于公式的解决方案之一从单个重新键入的值设置为重新计算事件似乎是一种低效的时间消耗.

总而言之,这是完全有道理的.原始数组公式类似于在两个字段上带有 INNER JOIN 的 SQL SELECT 语句,如果我的 SELECT 语句运行效率低下,我会做的第一件事就是查看表的索引.

<块引用>

在相关说明中,任何包含这么多数据的工作簿都应保存为 Excel 二进制工作簿,无论它是否启用宏.二进制工作簿 (.XLSB) 的文件大小通常是等效的 .XLSX 或 .XLSM 的 ¹⁄₃ 大小.除了更快的初始加载时间之外,许多批量操作应该会更快.

任何想要测试自己的优化的人都可以在这里找到我的示例 .XLSB 工作簿a> 暂时.不要在没有看到您首先进入的内容的情况下盲目地运行程序.

<小时>

¹ 数组公式需要使用 Ctrl+Shift+Enter↵ 完成.正确输入第一个单元格后,它们可以像任何其他公式一样填充或复制或复制.尝试将您的全列引用减少到更接近代表实际数据范围的范围.数组公式以对数方式消耗计算周期,因此将引用范围缩小到最小值是一种很好的做法.请参阅 数组公式的指南和示例了解更多信息.

My worksheet requires the following array formula in BG2.

=INDEX('Client'!O$2:O$347473,
       MATCH(1, (('Client_Cost'!D$2:D$347473='Client'!BC2)*
                 ('Client_Cost'!E$2:E$347473='Client'!BE2)), 0))

This provides a two-column match (Client_Cost!D:D to Client!BC2 AND Client_Cost!E:E to Client!BE2) and returns the corresponding value from Client!O:O.

The large number of rows makes the array formula very calculation-intensive. I can deal with a few hundred rows (~90 seconds for 500 rows) but I need results all the way down to Client'!BG347473 and I would like them sometime this year.

I've tried using Application Evaluate to return a result from the array formula into an variant array and subsequently returning the array of results to the worksheet en masse but it isn't the improvement I was hoping for. Looking for alternatives.

解决方案

First off, I would recommend developing alternative methods with a smaller data set. 5K or 10K rows will either demonstrate a noticeable improvement or not; you can always expand to the original data set once you are confident you are not going to get into a long 'Not responding' state that you have to crash out of.

A common method of removing the array from that style of array formula¹ is a 'helper' column that concatenates the two values from column D and E in the Client_Cost worksheet into a single, delimited value. For example, in Client_Cost!Z2 as,

=CONCATENATE(Client_Cost!D2, "|", Client_Cost!E2)

Filled down to Client_Cost!Z347473 should only take a second or two.

Once that is set up, a single INDEX/MATCH function pair can provide a vastly more efficient lookup on a similarly concatenated Client!BC2 and Client'!BE2. In Client!BG2 as,

=INDEX(Client!O$2:O$347473,
   MATCH(CONCATENATE(Client!BC2, "|", Client!BE2), 
         Client_Cost'!Z$2:Z$347473, 0))

That will take 1 hr, 51 minutes for 350K rows. While not yet optimal, it is a big improvement over the estimated ~17.5 hours that the original took.

The next logical step in optimizing that method would be working with a VBA Scripting.Dictionary object. A dictionary holds its own unique index on its keys and the concatenated values could be stuffed into a dictionary object to facilitate virtually instantaneous lookups on a large number of items (i.e. rows).

Sub JR_CSE_in_Array()
    Dim olr As Long, rws As Long, JR_Count As Long, JR_Values As Variant
    Dim v As Long, vTMP As Variant, vTMPs As Variant, dVALs As Object

    Debug.Print Timer
    Set dVALs = CreateObject("Scripting.Dictionary")

    'get some dimensions to the various data ranges
    With Worksheets("Client_Cost")
        'only use as many rows as absolutely necessary
        olr = Application.Min(.Cells(Rows.Count, "D").End(xlUp).Row, _
                              .Cells(Rows.Count, "E").End(xlUp).Row)
        'store D & E
        vTMPs = .Range(.Cells(2, 4), .Cells(olr, 5)).Value2

    End With
    With Worksheets("Client")
        rws = Application.Min(.Cells(Rows.Count, "BC").End(xlUp).Row, _
                              .Cells(Rows.Count, "BE").End(xlUp).Row, _
                              UBound(vTMPs, 1))
        'override the above statement for sampling
        'rws = 5000

        'building the Dictionary object takes a fair bit of time but it is worth it
        vTMP = .Range(.Cells(2, 15), .Cells(olr, 15)).Value2
        For v = LBound(vTMPs, 1) To UBound(vTMPs, 1)
            If Not dVALs.Exists(Join(Array(vTMPs(v, 1), vTMPs(v, 2)), ChrW(8203))) Then _
                dVALs.Add Key:=Join(Array(vTMPs(v, 1), vTMPs(v, 2)), ChrW(8203)), Item:=vTMP(v, 1)
        Next v

        'store BC and BE
        vTMPs = .Range(.Cells(2, 55), .Cells(olr, 57)).Value2
    End With

    ReDim JR_Values(1 To rws, 1 To 1)   'force a two-dimension, one-based index on the array
    'Debug.Print LBound(JR_Values) & ":" & UBound(JR_Values)

    For JR_Count = LBound(JR_Values, 1) To UBound(JR_Values, 1) Step 1
        If dVALs.Exists(Join(Array(vTMPs(JR_Count, 1), vTMPs(JR_Count, 3)), ChrW(8203))) Then
            JR_Values(JR_Count, 1) = dVALs.Item(Join(Array(vTMPs(JR_Count, 1), vTMPs(JR_Count, 3)), ChrW(8203)))
        End If
    Next JR_Count

    With Worksheets("Client")
        .Range("BG2").Resize(UBound(JR_Values), 1) = JR_Values
    End With

    'Debug.Print dVALs.Count
    dVALs.RemoveAll: Set dVALs = Nothing
    Debug.Print Timer
End Sub

The elapsed time for that routine to run (without helper column(s)) was 45.72 seconds. Breaking it down, it took a full 13.4 seconds just to build the dictionary and the remainder was largely taken up by the actual lookup with a half-second here and there attributed to the bulk seeding of the variant arrays from the worksheets' values.

         

So the Scripting.Dictionary is the clear winner here. Unfortunately, it is not automatically calculating updates in the various columns when values change but at this stage of development, the worksheet should be set to manual calculation. Setting one of the formula-based solutions into a recalculation event from a single retyped value seems an inefficient expenditure of time.

All-in-all, this makes perfect sense. The original array formula is analogous to an SQL SELECT statement with an INNER JOIN on two fields and if my SELECT statement was running inefficiently the first thing I would do to improve it would be to look at the tables' indexes.

On a related note, any workbook with this much data should be saved as a Excel Binary Workbook regardless of whether it is macro-enabled or not. The file size of a binary workbook (.XLSB) is typically ¹⁄₃ the size of an equivalent .XLSX or .XLSM. Beyond a faster initial load time, many bulk operations should prove faster.

Anyone wishing to test their own optimizations can find my sample .XLSB workbook here for the time being. Don't blindly run the procedures without seeing what you're getting into first.


¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.

这篇关于用 VBA 内存数组替换工作表数组公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆