如何在VBA中循环400万个阵列时减少时间成本? [英] How to reduce the time cost while looping in a 4 million array in VBA?

查看:68
本文介绍了如何在VBA中循环400万个阵列时减少时间成本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用VBA进行"vlookup"功能.我需要从包含460万条记录的访问数据库中查找数据.

I need to do the ‘vlookup’ function using the VBA. I need to lookup data from a access database containg 4.6 million records.

Private Sub connectDB()
 Dim sqlstr As String
 Dim mydata As String
 Dim t, d, conn, rst, mydata
 Dim arr, arr1
 t = Timer
 Set d = CreateObject("scripting.dictionary")
 Set conn = CreateObject("ADODB.Connection")
 Set rst = CreateObject("ADODB.Recordset")
 mydata = "mydatabase"
 strconn = "Provider = Microsoft.ACE.OLEDB.16.0; Data Source = " & mydata
 sqlstr = "select Tracking, MAWB from total"
 rst.Open sqlstr, strconn, 3, 2
 arr1 = Array("Tracking", "MAWB")
 arr = rst.GetRows(-1, 1, arr1)
STOP
#Above cost 1mins
 For i = 0 To UBound(arr, 2)
    d(arr(0, i)) = arr(1, i)
Next
STOP
#Put data into dictionary always costs me 20 mins

上面的过程总是要花费我20分钟左右.而且大多数都花在将数据放入字典中

The procedure above always cost me around 20 mins. And the majority of them are spent on the putting data into dictionary

是否要减少时间成本?

推荐答案

您可以通过实现自己的哈希表/词典.

You could reduce significantly the lookup time by implementing your own hashtable/dictionary.

下面是在5秒内为400万个数组编制索引的示例:

Here's an example indexing a 4 millions array under 5 seconds:

Private Declare PtrSafe Function RtlComputeCrc32 Lib "ntdll.dll" ( _
  ByVal start As Long, ByVal data As LongPtr, ByVal size As Long) As Long

Sub Example()
  Dim data(), slots() As Long, i As Long

  ' generate some records '

  ReDim data(0 To 1, 0 To 4000000)
  For i = 0 To UBound(data, 2)
    data(0, i) = CStr(i)
  Next

  ' index all the keys from column 1 '

  MapKeys slots, data, column:=0

  ' lookup a key in column 1 '

  i = IndexOfKey(slots, data, column:=0, key:="4876")

  If i >= 0 Then
    Debug.Print "Found at index " & i
  Else
    Debug.Print "Missing"
  End If

End Sub


Public Sub MapKeys(slots() As Long, data(), column As Long)
  Dim bucketsCount&, key$, r&, i&, s&, h&      
  bucketsCount = UBound(data, 2) * 0.9   ' n * load factor '
  ReDim slots(0 To UBound(data, 2) + bucketsCount)

  For r = 0 To UBound(data, 2) ' each record '
    key = data(column, r)
    h = RtlComputeCrc32(0, StrPtr(key), LenB(key)) And &H7FFFFFF  ' get hash '
    s = UBound(slots) - (h Mod bucketsCount)                      ' get slot '
    Do
      i = slots(s) - 1& ' get index (base 0) '

      If i >= 0& Then  ' if index for hash '
        If data(column, i) = data(column, r) Then Exit Do  ' if key present, handle next record '
      Else
        slots(s) = r + 1&  ' add index (base 1) '
        Exit Do
      End If

      s = i  ' collision, index points to the next slot '
    Loop
  Next
End Sub

Public Function IndexOfKey(slots() As Long, data(), column As Long, key As String) As Long
  Dim h&, s&, i&
  h = RtlComputeCrc32(0, StrPtr(key), LenB(key)) And &H7FFFFFF    ' get hash  '
  s = UBound(slots) - (h Mod (UBound(slots) - UBound(data, 2)))   ' get slot  '
  i = slots(s) - 1&                                               ' get index (base 0) '

  Do While i >= 0&
    If data(column, i) = key Then Exit Do  ' break if same key '
    i = slots(i) - 1&                      ' collision, index points to the next slot '
  Loop

  IndexOfKey = i
End Function

这篇关于如何在VBA中循环400万个阵列时减少时间成本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆