搜索集合 [英] Searching collections
问题描述
我正在使用相当大的数据集(> 100,000行),并尝试比较两个列表,以确定新列表中的哪些项目尚未在主列表中。换句话说,我想找到新的独特的项目。
我有一些使用vlookup和数组的VBA代码,但数组太大时会弹出〜70,000)。所以我转向收藏。但是我无法使用vlookup或匹配搜索集合。
Sub find_uniqueIDs()
Dim a As Long
Dim n As Long
Dim m As Variant
Dim oldnum As Long
Dim oldIDs As Variant
Dim oldcoll As New Collection
Dim newnum As Long
Dim newIDs As Variant
Dim newcoll As New Collection
oldnum = 75000
oldIDs = Range(A1,Range(A& oldnum))
newnum = 45000 + 3
newIDs = Range(G3,Range(G& newnum))
'使用数组搜索,但是当oldnum或newnum为〜70000
时为n = 1 to newnum - 3
m = Application.VLookup(newIDs(n,1),oldIDs,1,False)
如果IsError(m)Then Range(E100000)。End(xlUp)偏移(1,0)= newID(n,1)
下一步n
'使用集合搜索
对于n = 1 To oldnum
错误恢复下一步
oldcoll.Add oldIDs(n,1)
错误GoTo 0
下一个n
对于m = 1到newnum
在错误恢复下一个
newcoll.Add newID(m,1)
错误GoTo 0
下一个m
'这一段代码不起作用
对于a = 1到newcoll.Count
如果Application.VLookup(newcoll(a),oldcoll,1,False)=#N / AThen _
Range(E100000)。End(xlUp).Offset(1,0)= newcoll(a)
下一个
End Sub
任何想法如何确定特定项目是否在使用集合的主列表中?
这是一个简短的子演示一些脚本字典方法。
Sub list_New_Unique()
Dim dMASTER As Object,dNEW As Object,k As Variant
Dim v As Long,vVALs()As Variant,vNEWs()As Variant
Debug.Print开始:&定时器
设置dMASTER = CreateObject(Scripting.Dictionary)
设置dNEW = CreateObject(Scripting.Dictionary)
dMASTER.comparemode = vbTextCompare
dNEW。 comparemode = vbTextCompare
与工作表(Sheet7)
vVALs = .Range(A2:A100000)。Value2
vNEWs = .Range(C2:C100000) .Value2
结束
'填充dMASTER值
对于v = LBound(vVALs,1)到UBound(vVALs,1)
dMASTER.Add Key: = vVALs(v,1),项目:= vVALs(v,1)
下一个v
'只填充dNEW,在dMASTER
中找不到项目对于v = LBound vNEWs,1)到UBound(vNEWs,1)
如果不是dMASTER.exists(vNEWs(v,1))然后
如果不是dNEW.exists(vNEWs(v,1))Then _
dNEW.Add Key:= vNEWs(v,1),Item:= vNEWs(v,1)
End If
下一个v
Debug.Print dNEW.Count
对于每个k在dNEW.keys
'Debug.Print k
下一页k
Debug.PrintEnd:&定时器
dNEW.RemoveAll:设置dNEW = Nothing
dMASTER.RemoveAll:设置dMASTER = Nothing
End Sub
在A2:A100000中有99,999个唯一条目,C2:C89747中有89747个随机条目,这在9.87秒内发现没有在A2:A100000中找到70,087个独特的新条目。
I'm working with a rather large dataset (>100,000 rows) and trying to compare two lists to figure out which items in the new list are not already in the master list. In other words I want to find the new unique items.
I have some VBA code that uses vlookup and arrays that works, but bombs out when the arrays get too big (~70,000). So I've turned to collections. However I'm having difficulty searching the collections using vlookup or match.
Sub find_uniqueIDs()
Dim a As Long
Dim n As Long
Dim m As Variant
Dim oldnum As Long
Dim oldIDs As Variant
Dim oldcoll As New Collection
Dim newnum As Long
Dim newIDs As Variant
Dim newcoll As New Collection
oldnum = 75000
oldIDs = Range("A1", Range("A" & oldnum))
newnum = 45000 + 3
newIDs = Range("G3", Range("G" & newnum))
'Using arrays to search, but bombs out when oldnum or newnum are ~70000
For n = 1 To newnum - 3
m = Application.VLookup(newIDs(n, 1), oldIDs, 1, False)
If IsError(m) Then Range("E100000").End(xlUp).Offset(1, 0) = newIDs(n, 1)
Next n
'Using collections to search
For n = 1 To oldnum
On Error Resume Next
oldcoll.Add oldIDs(n, 1)
On Error GoTo 0
Next n
For m = 1 To newnum
On Error Resume Next
newcoll.Add newIDs(m, 1)
On Error GoTo 0
Next m
'This bit of code doesn't work
For a = 1 To newcoll.Count
If Application.VLookup(newcoll(a), oldcoll, 1, False) = "#N/A" Then _
Range("E100000").End(xlUp).Offset(1, 0) = newcoll(a)
Next a
End Sub
Any ideas how I can determine whether a particular item is in the master list using collections?
Here is a short sub demonstrating some of the scripting dictionary methods.
Sub list_New_Unique()
Dim dMASTER As Object, dNEW As Object, k As Variant
Dim v As Long, vVALs() As Variant, vNEWs() As Variant
Debug.Print "Start: " & Timer
Set dMASTER = CreateObject("Scripting.Dictionary")
Set dNEW = CreateObject("Scripting.Dictionary")
dMASTER.comparemode = vbTextCompare
dNEW.comparemode = vbTextCompare
With Worksheets("Sheet7")
vVALs = .Range("A2:A100000").Value2
vNEWs = .Range("C2:C100000").Value2
End With
'populate the dMASTER values
For v = LBound(vVALs, 1) To UBound(vVALs, 1)
dMASTER.Add Key:=vVALs(v, 1), Item:=vVALs(v, 1)
Next v
'only populate dNEW with items not found in dMASTER
For v = LBound(vNEWs, 1) To UBound(vNEWs, 1)
If Not dMASTER.exists(vNEWs(v, 1)) Then
If Not dNEW.exists(vNEWs(v, 1)) Then _
dNEW.Add Key:=vNEWs(v, 1), Item:=vNEWs(v, 1)
End If
Next v
Debug.Print dNEW.Count
For Each k In dNEW.keys
'Debug.Print k
Next k
Debug.Print "End: " & Timer
dNEW.RemoveAll: Set dNEW = Nothing
dMASTER.RemoveAll: Set dMASTER = Nothing
End Sub
With 99,999 unique entries in A2:A100000 and 89747 random entries in C2:C89747, this found 70,087 unique new entries not found in A2:A100000 in 9.87 seconds.
这篇关于搜索集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!