加快循环在Excel中的大数据集 [英] Speed up looping through large datasets in Excel
问题描述
目前,我在表a中使用for循环并循环通过表b。将行与if参数进行比较以获取复合键。
Sub ArraySearch()
Dim Main As Long
Dim Search As Long
Dim arrData()As Variant
Dim arrSource As Variant
arrData = Sheets(Sheet1)。值(H3:M500000)值
arrSource = Sheets(Ark1)。范围(A3:H500000)值
主= 1
搜索= 1
对于Main = 1 To UBound(arrSource,1)
对于搜索= 1到UBound(arrData,1)
如果arrSource Main,3)= arrData(Search,1)And _
arrSource(Main,4)= arrData(Search,2)And _
arrSource(Main,1)= arrData(Search,3)And _
arrSource(Main,2)= arrData(Search,4)And _
arrSource(Main,5)= arrData(Search,5)_
然后
arrSource ,8)= arrData(Search,6)
退出
结束如果
下一个
下一个
表格(Sheet2)。范围(A3:H500000)= arrSource
结束Sub
目前为止最快的方法是将两个表加载到一个数组中,并在内存循环中执行。
这是永远不变的。我们正在谈论几个小时而不是几分钟。
有没有办法提高速度?
还是需要使用其他程序?
(将其加载到数据库并使用SQL,使用正常的VB.net,SSIS的visual studio)
我希望这可以在VBA中完成,所以任何指针都将不胜感激。
编辑
列键提高速度,还是必须迭代创建滞后的行的共享卷?
最快比较两个列表的方法是将值添加到基于公共密钥的词典。字典被优化为搜索密钥,并将返回基于密钥的值快得多,然后可以遍历数组。
Sub DictionarySearch()
Dim dict
Dim key As String
Dim x As Long
Dim arrData()As Variant
Dim arrSource As Variant
Set dict = CreateObject(Scripting.Dictionary)
arrData = Worksheets(值(H3:M500000)值
arrSource =工作表(Ark1)。范围(A3:H500000)值
对于x = 1 To UBound(arrData,1)
key = arrData(x,1)& :& arrData(x,2)& :& arrData(x,3)& :& arrData(x,4)& :& arrData(x,5)
如果不是dict.Exists(key)然后dict.Add键,arrData(x,6)
下一个
对于x = 1到UBound(arrSource,1)
key = arrSource(x,3)& :& arrSource(x,4)& :& arrSource(x,1)& :& arrSource(x,2)& :& arrSource(x,5)
如果dict.Exists(key)然后arrSource(x,8)= dict(key)
下一个
表格(Sheet2)范围(A3:H500000)= arrSource
End Sub
I have two datasets that I need to compare and extract a match from. I have a composite key from 5 columns in each dataset, end a 6th column i need to extract. The columns are composed of text, date and integers. Both sets are slightly under 500k rows.
Currently I use a for loop in table a and loop through table b. Compare the rows with an if statement with the and argument to get the composite key.
Sub ArraySearch()
Dim Main As Long
Dim Search As Long
Dim arrData() As Variant
Dim arrSource As Variant
arrData = Sheets("Sheet1").Range("H3:M500000").Value
arrSource = Sheets("Ark1").Range("A3:H500000").Value
Main = 1
Search = 1
For Main = 1 To UBound(arrSource, 1)
For Search = 1 To UBound(arrData, 1)
If arrSource(Main, 3) = arrData(Search, 1) And _
arrSource(Main, 4) = arrData(Search, 2) And _
arrSource(Main, 1) = arrData(Search, 3) And _
arrSource(Main, 2) = arrData(Search, 4) And _
arrSource(Main, 5) = arrData(Search, 5) _
Then
arrSource(Main, 8) = arrData(Search, 6)
Exit For
End If
Next
Next
Sheets("Sheet2").Range("A3:H500000") = arrSource
End Sub
The fastest way so far is to load both tables into an array and do an in memory loop.
This is taking for ever. We are talking about hours not minutes.
Are there any methods that will increase the speed? Or do I need to use some other programs? (load it into a database and use SQL, use visual studio with normal VB.net, SSIS)
I was hoping this could be done in VBA, so any pointers would be much appreciated.
EDIT
Would hashing the 5 column key improve speed, or is it the share volume of rows that has to be iterated that creates the lag?
The fastest way to compare two lists is to add values to Dictionary based on a common key. The Dictionary is optimized to search for keys and will return a value based on the key much faster then you can iterate through an array.
Sub DictionarySearch()
Dim dict
Dim key As String
Dim x As Long
Dim arrData() As Variant
Dim arrSource As Variant
Set dict = CreateObject("Scripting.Dictionary")
arrData = Worksheets("Sheet1").Range("H3:M500000").Value
arrSource = Worksheets("Ark1").Range("A3:H500000").Value
For x = 1 To UBound(arrData, 1)
key = arrData(x, 1) & ":" & arrData(x, 2) & ":" & arrData(x, 3) & ":" & arrData(x, 4) & ":" & arrData(x, 5)
If Not dict.Exists(key) Then dict.Add key, arrData(x, 6)
Next
For x = 1 To UBound(arrSource, 1)
key = arrSource(x, 3) & ":" & arrSource(x, 4) & ":" & arrSource(x, 1) & ":" & arrSource(x, 2) & ":" & arrSource(x, 5)
If dict.Exists(key) Then arrSource(x, 8) = dict(key)
Next
Sheets("Sheet2").Range("A3:H500000") = arrSource
End Sub
这篇关于加快循环在Excel中的大数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!