搜索集合 [英] Searching collections

查看:96
本文介绍了搜索集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用相当大的数据集(> 100,000行),并尝试比较两个列表,以确定新列表中的哪些项目尚未在主列表中。换句话说,我想找到新的独特的项目。



我有一些使用vlookup和数组的VBA代码,但数组太大时会弹出〜70,000)。所以我转向收藏。但是我无法使用vlookup或匹配搜索集合。

  Sub find_uniqueIDs()

Dim a As Long
Dim n As Long
Dim m As Variant
Dim oldnum As Long
Dim oldIDs As Variant
Dim oldcoll As New Collection
Dim newnum As Long
Dim newIDs As Variant
Dim newcoll As New Collection

oldnum = 75000
oldIDs = Range(A1,Range(A& oldnum))
newnum = 45000 + 3
newIDs = Range(G3,Range(G& newnum))

'使用数组搜索,但是当oldnum或newnum为〜70000
时为n = 1 to newnum - 3
m = Application.VLookup(newIDs(n,1),oldIDs,1,False)
如果IsError(m)Then Range(E100000)。End(xlUp)偏移(1,0)= newID(n,1)
下一步n

'使用集合搜索
对于n = 1 To oldnum
错误恢复下一步
oldcoll.Add oldIDs(n,1)
错误GoTo 0
下一个n

对于m = 1到newnum
在错误恢复下一个
newcoll.Add newID(m,1)
错误GoTo 0
下一个m

'这一段代码不起作用
对于a = 1到newcoll.Count
如果Application.VLookup(newcoll(a),oldcoll,1,False)=#N / AThen _
Range(E100000)。End(xlUp).Offset(1,0)= newcoll(a)
下一个

End Sub

任何想法如何确定特定项目是否在使用集合的主列表中?

解决方案

这是一个简短的子演示一些脚本字典方法。

  Sub list_New_Unique()
Dim dMASTER As Object,dNEW As Object,k As Variant
Dim v As Long,vVALs()As Variant,vNEWs()As Variant

Debug.Print开始:&定时器

设置dMASTER = CreateObject(Scripting.Dictionary)
设置dNEW = CreateObject(Scripting.Dictionary)
dMASTER.comparemode = vbTextCompare
dNEW。 comparemode = vbTextCompare

与工作表(Sheet7)
vVALs = .Range(A2:A100000)。Value2
vNEWs = .Range(C2:C100000) .Value2
结束

'填充dMASTER值
对于v = LBound(vVALs,1)到UBound(vVALs,1)
dMASTER.Add Key: = vVALs(v,1),项目:= vVALs(v,1)
下一个v

'只填充dNEW,在dMASTER
中找不到项目对于v = LBound vNEWs,1)到UBound(vNEWs,1)
如果不是dMASTER.exists(vNEWs(v,1))然后
如果不是dNEW.exists(vNEWs(v,1))Then _
dNEW.Add Key:= vNEWs(v,1),Item:= vNEWs(v,1)
End If
下一个v

Debug.Print dNEW.Count

对于每个k在dNEW.keys
'Debug.Print k
下一页k

Debug.PrintEnd:&定时器

dNEW.RemoveAll:设置dNEW = Nothing
dMASTER.RemoveAll:设置dMASTER = Nothing
End Sub

在A2:A100000中有99,999个唯一条目,C2:C89747中有89747个随机条目,这在9.87秒内发现没有在A2:A100000中找到70,087个独特的新条目。


I'm working with a rather large dataset (>100,000 rows) and trying to compare two lists to figure out which items in the new list are not already in the master list. In other words I want to find the new unique items.

I have some VBA code that uses vlookup and arrays that works, but bombs out when the arrays get too big (~70,000). So I've turned to collections. However I'm having difficulty searching the collections using vlookup or match.

Sub find_uniqueIDs()

Dim a As Long
Dim n As Long
Dim m As Variant
Dim oldnum As Long
Dim oldIDs As Variant
Dim oldcoll As New Collection
Dim newnum As Long
Dim newIDs As Variant
Dim newcoll As New Collection

oldnum = 75000
oldIDs = Range("A1", Range("A" & oldnum))
newnum = 45000 + 3
newIDs = Range("G3", Range("G" & newnum))

'Using arrays to search, but bombs out when oldnum or newnum are ~70000
For n = 1 To newnum - 3
    m = Application.VLookup(newIDs(n, 1), oldIDs, 1, False)
    If IsError(m) Then Range("E100000").End(xlUp).Offset(1, 0) = newIDs(n, 1)
Next n

'Using collections to search
For n = 1 To oldnum
On Error Resume Next
    oldcoll.Add oldIDs(n, 1)
On Error GoTo 0
Next n

For m = 1 To newnum
On Error Resume Next
    newcoll.Add newIDs(m, 1)
On Error GoTo 0
Next m

'This bit of code doesn't work
For a = 1 To newcoll.Count
If Application.VLookup(newcoll(a), oldcoll, 1, False) = "#N/A" Then _
    Range("E100000").End(xlUp).Offset(1, 0) = newcoll(a)
Next a

End Sub

Any ideas how I can determine whether a particular item is in the master list using collections?

解决方案

Here is a short sub demonstrating some of the scripting dictionary methods.

Sub list_New_Unique()
    Dim dMASTER As Object, dNEW As Object, k As Variant
    Dim v As Long, vVALs() As Variant, vNEWs() As Variant

    Debug.Print "Start: " & Timer

    Set dMASTER = CreateObject("Scripting.Dictionary")
    Set dNEW = CreateObject("Scripting.Dictionary")
    dMASTER.comparemode = vbTextCompare
    dNEW.comparemode = vbTextCompare

    With Worksheets("Sheet7")
        vVALs = .Range("A2:A100000").Value2
        vNEWs = .Range("C2:C100000").Value2
    End With

    'populate the dMASTER values
    For v = LBound(vVALs, 1) To UBound(vVALs, 1)
        dMASTER.Add Key:=vVALs(v, 1), Item:=vVALs(v, 1)
    Next v

    'only populate dNEW with items not found in dMASTER
    For v = LBound(vNEWs, 1) To UBound(vNEWs, 1)
        If Not dMASTER.exists(vNEWs(v, 1)) Then
            If Not dNEW.exists(vNEWs(v, 1)) Then _
                dNEW.Add Key:=vNEWs(v, 1), Item:=vNEWs(v, 1)
        End If
    Next v

    Debug.Print dNEW.Count

    For Each k In dNEW.keys
        'Debug.Print k
    Next k

    Debug.Print "End: " & Timer

    dNEW.RemoveAll: Set dNEW = Nothing
    dMASTER.RemoveAll: Set dMASTER = Nothing
End Sub

With 99,999 unique entries in A2:A100000 and 89747 random entries in C2:C89747, this found 70,087 unique new entries not found in A2:A100000 in 9.87 seconds.

这篇关于搜索集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆