如何优化 vlookup 以获得高搜索次数?(VLOOKUP 的替代品) [英] How to optimize vlookup for high search count ? (alternatives to VLOOKUP)
问题描述
我正在寻找 vlookup 的替代方案,在感兴趣的上下文中提高性能.
I am looking for alternatives to vlookup, with improved performance within the context of interest.
上下文如下:
- 我有一个很大的 {key;data} 数据集(约 100'000 条记录)
- 我想对数据集执行很多 VLOOKUP 操作(典型用途是对整个数据集重新排序)
- 我的数据集没有重复的键
- 我只寻找完全匹配(
VLOOKUP
的最后一个参数是FALSE
)
- I have a data set of {key;data} which is big (~ 100'000 records)
- I want to perform a lot of VLOOKUP operations on the dataset (typical use is to reorder the whole dataset)
- My data set has no duplicate keys
- I am looking only for exact matches (last argument to
VLOOKUP
isFALSE
)
要解释的模式:
参考表:("sheet1"
)
A B
1
2 key1 data1
3 key2 data2
4 key3 data3
... ... ...
99999 key99998 data99998
100000 key99999 data99999
100001 key100000 data100000
100002
查找表:
A B
1
2 key51359 =VLOOKUP(A2;sheet1!$A$2:$B$100001;2;FALSE)
3 key41232 =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)
4 key10102 =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)
... ... ...
99999 key4153 =VLOOKUP(A99999;sheet1!$A$2:$B$100001;2;FALSE)
100000 key12818 =VLOOKUP(A100000;sheet1!$A$2:$B$100001;2;FALSE)
100001 key35032 =VLOOKUP(A100001;sheet1!$A$2:$B$100001;2;FALSE)
100002
在我的 Core i7 M 620 @2.67 GHz 上,计算时间约为 10 分钟
On my Core i7 M 620 @2.67 GHz, this computes in ~10 minutes
在这种情况下,是否有性能更好的 VLOOKUP 替代方案?
Are there alternatives to VLOOKUP with better performance in this context ?
推荐答案
我考虑了以下替代方案:
I considered the following alternatives:
- VLOOKUP 数组公式
- 匹配/索引
- VBA(使用字典)
比较性能为:
- VLOOKUP 简单公式:~10 分钟
- VLOOKUP 数组公式:约 10 分钟(1:1 性能指数)
- 比赛/指数:~2 分钟(5:1 表现指数)
- VBA(使用字典):约 6 秒(100:1 性能指数)
使用相同的参考表
1) 查找表:(vlookup 数组公式版)
1) Lookup sheet: (vlookup array formula version)
A B
1
2 key51359 {=VLOOKUP(A2:A10001;sheet1!$A$2:$B$100001;2;FALSE)}
3 key41232 formula in B2
4 key10102 ... extends to
... ... ...
99999 key4153 ... cell B100001
100000 key12818 ... (select whole range, and press
100001 key35032 ... CTRL+SHIFT+ENTER to make it an array formula)
100002
2) 查找表:(匹配+索引版本)
2) Lookup sheet: (match+index version)
A B C
1
2 key51359 =MATCH(A2;sheet1!$A$2:$A$100001;) =INDEX(sheet1!$B$2:$B$100001;B2)
3 key41232 =MATCH(A3;sheet1!$A$2:$A$100001;) =INDEX(sheet1!$B$2:$B$100001;B3)
4 key10102 =MATCH(A4;sheet1!$A$2:$A$100001;) =INDEX(sheet1!$B$2:$B$100001;B4)
... ... ... ...
99999 key4153 =MATCH(A99999;sheet1!$A$2:$A$100001;) =INDEX(sheet1!$B$2:$B$100001;B99999)
100000 key12818 =MATCH(A100000;sheet1!$A$2:$A$100001;) =INDEX(sheet1!$B$2:$B$100001;B100000)
100001 key35032 =MATCH(A100001;sheet1!$A$2:$A$100001;) =INDEX(sheet1!$B$2:$B$100001;B100001)
100002
3) 查找表:(vbalookup 版本)
3) Lookup sheet: (vbalookup version)
A B
1
2 key51359 {=vbalookup(A2:A50001;sheet1!$A$2:$B$100001;2)}
3 key41232 formula in B2
4 key10102 ... extends to
... ... ...
50000 key91021 ...
50001 key42 ... cell B50001
50002 key21873 {=vbalookup(A50002:A100001;sheet1!$A$2:$B$100001;2)}
50003 key31415 formula in B50001 extends to
... ... ...
99999 key4153 ... cell B100001
100000 key12818 ... (select whole range, and press
100001 key35032 ... CTRL+SHIFT+ENTER to make it an array formula)
100002
注意:由于某些(外部内部)原因,vbalookup 无法一次返回超过 65536 条数据.所以我不得不将数组公式一分为二.
NB : For some (external internal) reason, the vbalookup fails to return more than 65536 data at a time. So I had to split the array formula in two.
和相关的 VBA 代码:
and the associated VBA code :
Function vbalookup(lookupRange As Range, refRange As Range, dataCol As Long) As Variant
Dim dict As New Scripting.Dictionary
Dim myRow As Range
Dim I As Long, J As Long
Dim vResults() As Variant
' 1. Build a dictionnary
For Each myRow In refRange.Columns(1).Cells
' Append A : B to dictionnary
dict.Add myRow.Value, myRow.Offset(0, dataCol - 1).Value
Next myRow
' 2. Use it over all lookup data
ReDim vResults(1 To lookupRange.Rows.Count, 1 To lookupRange.Columns.Count) As Variant
For I = 1 To lookupRange.Rows.Count
For J = 1 To lookupRange.Columns.Count
If dict.Exists(lookupRange.Cells(I, J).Value) Then
vResults(I, J) = dict(lookupRange.Cells(I, J).Value)
End If
Next J
Next I
vbalookup = vResults
End Function
注意:Scripting.Dictionary
需要对 Microsoft Scripting Runtime
的引用,它必须是手动添加(Excel VBA 窗口中的工具->参考菜单)
NB: Scripting.Dictionary
requires a referenc to Microsoft Scripting Runtime
which must be
added manually (Tools->References menu in the Excel VBA window)
结论:
在这种情况下,使用字典的 VBA 比使用 VLOOKUP 快 100 倍,比使用 MATCH/INDEX 快 20 倍
In this context, VBA using a dictionary is 100x faster than using VLOOKUP and 20x faster than MATCH/INDEX
这篇关于如何优化 vlookup 以获得高搜索次数?(VLOOKUP 的替代品)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!