如何优化vlookup高搜索计数? (VLOOKUP的替代品) [英] How to optimize vlookup for high search count ? (alternatives to VLOOKUP)

查看:218
本文介绍了如何优化vlookup高搜索计数? (VLOOKUP的替代品)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找vlookup的替代方案,在感兴趣的上下文中提高性能。



上下文如下:




  • 我有一个数据集{key; data},大(〜100'000条记录)

  • 我想执行一个数据集上的许多VLOOKUP操作(典型的用途是重新排序整个数据集)

  • 我的数据集没有重复键

  • 我只在寻找对于完全匹配(最后一个参数 VLOOKUP FALSE



要解释的模式:



参考表:(sheet1

  AB 
1
2 key1 data1
3 key2 data2
4 key3 data3
... ...
99999 key99998 data99998
100000 key99999 data99999
100001 key100000 data100000
100002

查找表:

  AB 
1
2 key51359 = VLOOKUP(A2; sheet1!$ A $ 2:$ B $ 100001; 2; FALSE)
3 key41232 = VLOOKUP(A3; sheet1!$ A $ 2:$ B $ 100001; 2; FALSE)
4 key10102 = VLOOKUP(A3; sheet1!$ A $ 2:$ B $ 100001; 2; FALSE)
...
99999 key4153 = VLOOKUP(A99999; sheet1!$ A $ 2:$ B $ 100001; 2; FALSE)
100000 key12818 = VLOOKUP(A100000; sheet1!$ A $ 2 :$ B $ 100001; 2; FALSE)
100001 key35032 = VLOOKUP(A100001; sheet1!$ A $ 2:$ B $ 100001; 2; FALSE)
100002

在我的Core i7 M 620 @ 2.67 GHz上,这个计算在〜10分钟内



在这种情况下,VLOOKUP有更好的表现吗?

解决方案

我考虑了以下替代方案:




  • VLOOKUP数组公式

  • MATCH / INDEX

  • VBA(使用字典) / li>


比较的表现是:




  • VLOOKUP简单公式:〜10分钟

  • VLOOKUP数组公式:〜10分钟(1:1性能指标)

  • MATCH / INDEX:〜 2分钟(5:1性能指标)

  • VBA(使用字典):〜6秒(100:1性能指标)



使用相同的参考表



1)查找表:(vlookup数组公式版本)

  AB 
1
2 key51359 {= VLOOKUP(A2:A10001; sheet1!$ A $ 2:$ B $ 100001; 2; FALSE)}
3 key41232公式在B2
4 key10102 ...扩展到
... ...
99999 key4153 ...单元格B100001
100000 key12818 ...(选择整个范围,然后按
100001 key35032 ... CTRL + SHIFT + ENTER使其成为数组公式)
100002

2)查找表:(match + index version)

  AB C 
1
2 key51359 = MATCH(A2; sheet1!$ A $ 2:$ A $ 100001;)= INDEX(sheet1!$ B $ 2:$ B $ 100001; B2)
3 key41232 = MATCH(A3; sheet1!$ A $ 2:$ A $ 100001;)= INDEX(sheet1!$ B $ 2:$ B $ 100001; B3)
4 key10102 = MATCH(A4; sheet1!$ A $ 2:$ A $ 100001 ;)= INDEX(sheet1!$ B $ 2:$ B $ 100001; B4)
... ... ...
99999 key4153 = MATCH(A99999; sheet1!$ A $ 2: $ A $ 100001;)= INDEX(sheet1!$ B $ 2:$ B $ 100001; B99999)
100000 key12818 = MATCH(A100000; sheet1!$ A $ 2:$ A $ 100001;)= INDEX(sheet1!$ B $ 2 :$ B $ 100001; B100000)
100001 key35032 = MATCH(A100001; sheet1!$ A $ 2:$ A $ 100001;)= INDEX(sheet1!$ B $ 2:$ B $ 100001; B100001)
100002

3)查找表:(vbalookup版本)

  AB 
1
2 key51359 {= vbalookup(A2:A50001; sheet1!$ A $ 2:$ B $ 100001; 2)}
3 key41232公式在B2
4 key10102 ...扩展到
... ...
50000 key91021 ...
50001 key42 ...单元格B50001
50002 key21873 {= vbalookup(A50002:A100001; sheet1!$ A $ 2:$ B $ 100001; 2)}
50003 key31415 B50001中的公式扩展到
... ...
99999 key4153 ...单元格B100001
100000 key12818 ... (选择整个范围,然后按
100001 key35032 ... CTRL + SHIFT + ENTER使其成为数组公式)
100002

NB :对于某些(外部内部)原因,vbalookup一次无法返回超过65536个数据。
所以我不得不将数组公​​式拆分成两个。



和相关的VBA代码:

 函数vbalookup(lookupRange As Range,refRange As Range,dataCol As Long)As Variant 
Dim dict As New Scripting.Dictionary
Dim myRow As Range
Dim I As Long,J As Long
Dim vResults()As Variant

'1.构建一个dictionnary
对于每个myRow在refRange.Columns(1)中.Cells
'将A:B附加到dictionnary
dict.Add myRow.Value,myRow.Offset(0,dataCol - 1).Value
Next myRow

'2.使用它遍历所有查找数据
ReDim vResults(1 To lookupRange.Rows.Count,1 To lookupRange.Columns.Count)As Variant
For I = 1 To lookupRange.Rows.Count
For J = 1 to lookupRange.Columns.Count
如果dict.Exists(lookupRange.Cells(I,J).Value)然后
vResults(I,J)= dict(lookupRange.Cells(I,J) .Value)
结束如果
下一个J
下一个我

vbalookup = vResults
结束函数

NB: Scripting.Dictionary 需要一个引用来指向 Microsoft Scripting Runtime ,它必须手动添加
(Excel VBA窗口中的Tools-> References菜单)



结论:



在这种情况下,使用字典的VBA比使用VLOOKUP要快100倍,比MATCH / INDEX快20倍。 p>

I am looking for alternatives to vlookup, with improved performance within the context of interest.

The context is the following:

  • I have a data set of {key;data} which is big (~ 100'000 records)
  • I want to perform a lot of VLOOKUP operations on the dataset (typical use is to reorder the whole dataset)
  • My data set has no duplicate keys
  • I am looking only for exact matches (last argument to VLOOKUP is FALSE)

A schema to explain :

Reference sheet : ("sheet1")

        A           B
     1
     2  key1        data1
     3  key2        data2
     4  key3        data3
   ...  ...         ...
 99999  key99998    data99998
100000  key99999    data99999
100001  key100000   data100000
100002

Lookup sheet:

        A           B
     1
     2  key51359    =VLOOKUP(A2;sheet1!$A$2:$B$100001;2;FALSE)
     3  key41232    =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)
     4  key10102    =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)
   ...  ...         ...
 99999  key4153     =VLOOKUP(A99999;sheet1!$A$2:$B$100001;2;FALSE)
100000  key12818    =VLOOKUP(A100000;sheet1!$A$2:$B$100001;2;FALSE)
100001  key35032    =VLOOKUP(A100001;sheet1!$A$2:$B$100001;2;FALSE)
100002

On my Core i7 M 620 @2.67 GHz, this computes in ~10 minutes

Are there alternatives to VLOOKUP with better performance in this context ?

解决方案

I considered the following alternatives:

  • VLOOKUP array-formula
  • MATCH / INDEX
  • VBA (using a dictionary)

The compared performance is:

  • VLOOKUP simple formula : ~10 minutes
  • VLOOKUP array-formula : ~10 minutes (1:1 performance index)
  • MATCH / INDEX : ~2 minutes (5:1 performance index)
  • VBA (using a dictionary) : ~6 seconds (100:1 performance index)

Using the same reference sheet

1) Lookup sheet: (vlookup array formula version)

         A          B
     1
     2   key51359    {=VLOOKUP(A2:A10001;sheet1!$A$2:$B$100001;2;FALSE)}
     3   key41232    formula in B2
     4   key10102    ... extends to
   ...   ...         ... 
 99999   key4153     ... cell B100001
100000   key12818    ... (select whole range, and press
100001   key35032    ... CTRL+SHIFT+ENTER to make it an array formula)
100002

2) Lookup sheet: (match+index version)

         A           B                                       C
      1
      2  key51359    =MATCH(A2;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B2)
      3  key41232    =MATCH(A3;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B3)
      4  key10102    =MATCH(A4;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B4)
    ...  ...         ...                                     ...
  99999  key4153     =MATCH(A99999;sheet1!$A$2:$A$100001;)   =INDEX(sheet1!$B$2:$B$100001;B99999)
 100000  key12818    =MATCH(A100000;sheet1!$A$2:$A$100001;)  =INDEX(sheet1!$B$2:$B$100001;B100000)
 100001  key35032    =MATCH(A100001;sheet1!$A$2:$A$100001;)  =INDEX(sheet1!$B$2:$B$100001;B100001)
 100002

3) Lookup sheet: (vbalookup version)

       A          B
     1
     2  key51359    {=vbalookup(A2:A50001;sheet1!$A$2:$B$100001;2)}
     3  key41232    formula in B2
     4  key10102    ... extends to
   ...  ...         ...
 50000  key91021    ... 
 50001  key42       ... cell B50001
 50002  key21873    {=vbalookup(A50002:A100001;sheet1!$A$2:$B$100001;2)}
 50003  key31415    formula in B50001 extends to
   ...  ...         ...
 99999  key4153     ... cell B100001
100000  key12818    ... (select whole range, and press
100001  key35032    ... CTRL+SHIFT+ENTER to make it an array formula)
100002

NB : For some (external internal) reason, the vbalookup fails to return more than 65536 data at a time. So I had to split the array formula in two.

and the associated VBA code :

Function vbalookup(lookupRange As Range, refRange As Range, dataCol As Long) As Variant
  Dim dict As New Scripting.Dictionary
  Dim myRow As Range
  Dim I As Long, J As Long
  Dim vResults() As Variant

  ' 1. Build a dictionnary
  For Each myRow In refRange.Columns(1).Cells
    ' Append A : B to dictionnary
    dict.Add myRow.Value, myRow.Offset(0, dataCol - 1).Value
  Next myRow

  ' 2. Use it over all lookup data
  ReDim vResults(1 To lookupRange.Rows.Count, 1 To lookupRange.Columns.Count) As Variant
  For I = 1 To lookupRange.Rows.Count
    For J = 1 To lookupRange.Columns.Count
      If dict.Exists(lookupRange.Cells(I, J).Value) Then
        vResults(I, J) = dict(lookupRange.Cells(I, J).Value)
      End If
    Next J
  Next I

  vbalookup = vResults
End Function

NB: Scripting.Dictionary requires a referenc to Microsoft Scripting Runtime which must be added manually (Tools->References menu in the Excel VBA window)

Conclusion :

In this context, VBA using a dictionary is 100x faster than using VLOOKUP and 20x faster than MATCH/INDEX

这篇关于如何优化vlookup高搜索计数? (VLOOKUP的替代品)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆