加快循环在Excel中的大数据集 [英] Speed up looping through large datasets in Excel

查看:135
本文介绍了加快循环在Excel中的大数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据集,我需要比较和提取一个匹配。我有一个来自每个数据集中5列的复合键,结束我需要提取的第6列。列由文本,日期和整数组成。两组都略低于500k行。



目前,我在表a中使用for循环并循环通过表b。将行与if参数进行比较以获取复合键。

  Sub ArraySearch()

Dim Main As Long
Dim Search As Long
Dim arrData()As Variant
Dim arrSource As Variant

arrData = Sheets(Sheet1)。值(H3:M500000)值
arrSource = Sheets(Ark1)。范围(A3:H500000)值

主= 1
搜索= 1

对于Main = 1 To UBound(arrSource,1)

对于搜索= 1到UBound(arrData,1)

如果arrSource Main,3)= arrData(Search,1)And _
arrSource(Main,4)= arrData(Search,2)And _
arrSource(Main,1)= arrData(Search,3)And _
arrSource(Main,2)= arrData(Search,4)And _
arrSource(Main,5)= arrData(Search,5)_
然后
arrSource ,8)= arrData(Search,6)
退出
结束如果

下一个
下一个

表格(Sheet2)。范围(A3:H500000)= arrSource

结束Sub

目前为止最快的方法是将两个表加载到一个数组中,并在内存循环中执行。



这是永远不变的。我们正在谈论几个小时而不是几分钟。



有没有办法提高速度?
还是需要使用其他程序?
(将其加载到数据库并使用SQL,使用正常的VB.net,SSIS的visual studio)



我希望这可以在VBA中完成,所以任何指针都将不胜感激。



编辑



列键提高速度,还是必须迭代创建滞后的行的共享卷?

解决方案

最快比较两个列表的方法是将值添加到基于公共密钥的词典。字典被优化为搜索密钥,并将返回基于密钥的值快得多,然后可以遍历数组。

  Sub DictionarySearch()
Dim dict
Dim key As String
Dim x As Long
Dim arrData()As Variant
Dim arrSource As Variant

Set dict = CreateObject(Scripting.Dictionary)

arrData = Worksheets(值(H3:M500000)值
arrSource =工作表(Ark1)。范围(A3:H500000)值

对于x = 1 To UBound(arrData,1)
key = arrData(x,1)& :& arrData(x,2)& :& arrData(x,3)& :& arrData(x,4)& :& arrData(x,5)
如果不是dict.Exists(key)然后dict.Add键,arrData(x,6)

下一个

对于x = 1到UBound(arrSource,1)
key = arrSource(x,3)& :& arrSource(x,4)& :& arrSource(x,1)& :& arrSource(x,2)& :& arrSource(x,5)
如果dict.Exists(key)然后arrSource(x,8)= dict(key)
下一个

表格(Sheet2)范围(A3:H500000)= arrSource
End Sub


I have two datasets that I need to compare and extract a match from. I have a composite key from 5 columns in each dataset, end a 6th column i need to extract. The columns are composed of text, date and integers. Both sets are slightly under 500k rows.

Currently I use a for loop in table a and loop through table b. Compare the rows with an if statement with the and argument to get the composite key.

Sub ArraySearch()

    Dim Main As Long
    Dim Search As Long
    Dim arrData() As Variant
    Dim arrSource As Variant

    arrData = Sheets("Sheet1").Range("H3:M500000").Value
    arrSource = Sheets("Ark1").Range("A3:H500000").Value

    Main = 1
    Search = 1

    For Main = 1 To UBound(arrSource, 1)

        For Search = 1 To UBound(arrData, 1)

            If arrSource(Main, 3) = arrData(Search, 1) And _
                arrSource(Main, 4) = arrData(Search, 2) And _
                arrSource(Main, 1) = arrData(Search, 3) And _
                arrSource(Main, 2) = arrData(Search, 4) And _
                arrSource(Main, 5) = arrData(Search, 5) _
            Then
                arrSource(Main, 8) = arrData(Search, 6)
                Exit For
            End If

        Next
    Next

    Sheets("Sheet2").Range("A3:H500000") = arrSource

End Sub

The fastest way so far is to load both tables into an array and do an in memory loop.

This is taking for ever. We are talking about hours not minutes.

Are there any methods that will increase the speed? Or do I need to use some other programs? (load it into a database and use SQL, use visual studio with normal VB.net, SSIS)

I was hoping this could be done in VBA, so any pointers would be much appreciated.

EDIT

Would hashing the 5 column key improve speed, or is it the share volume of rows that has to be iterated that creates the lag?

解决方案

The fastest way to compare two lists is to add values to Dictionary based on a common key. The Dictionary is optimized to search for keys and will return a value based on the key much faster then you can iterate through an array.

Sub DictionarySearch()
    Dim dict
    Dim key As String
    Dim x As Long
    Dim arrData() As Variant
    Dim arrSource As Variant

    Set dict = CreateObject("Scripting.Dictionary")

    arrData = Worksheets("Sheet1").Range("H3:M500000").Value
    arrSource = Worksheets("Ark1").Range("A3:H500000").Value

    For x = 1 To UBound(arrData, 1)
        key = arrData(x, 1) & ":" & arrData(x, 2) & ":" & arrData(x, 3) & ":" & arrData(x, 4) & ":" & arrData(x, 5)
        If Not dict.Exists(key) Then dict.Add key, arrData(x, 6)

    Next

    For x = 1 To UBound(arrSource, 1)
        key = arrSource(x, 3) & ":" & arrSource(x, 4) & ":" & arrSource(x, 1) & ":" & arrSource(x, 2) & ":" & arrSource(x, 5)
        If dict.Exists(key) Then arrSource(x, 8) = dict(key)
    Next

    Sheets("Sheet2").Range("A3:H500000") = arrSource
End Sub

这篇关于加快循环在Excel中的大数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆