在列中查找重复项 [英] Find duplicates in a column

查看:60
本文介绍了在列中查找重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码对给定列中的重复项进行计数并给出相同的计数,但是我需要是否有重复项在后续单元格中提及找到重复项",例如,如果单元格F3,F4和F15中的值相同(因为我正在验证列"F",所以必须存在列"G"的空白列),然后应对其进行排序,并在单元格G3,G4和G15中找到找到重复项".

The below code counts the duplicates in a given column and gives the count of same but i need if there is/are any duplicates mention "Duplicate found" in subsequent cell for example if values are same in cell F3, F4 and F15 (required Blank column that is column "G" is already present as i am validating column "F") then it should be sorted and in cell G3, G4 and G15 "Duplicate found" should be there.

Dim helperCol As Range
Dim count As Long

With Worksheets("Sheet1")
    Set helperCol = .UsedRange.Resize(, 1).Offset(, .UsedRange.Columns.count)
    With .Range("F1", .Cells(.Rows.count, 6).End(xlUp))
        helperCol.Value = .Value
        helperCol.RemoveDuplicates Columns:=1, Header:=xlYes
        count = .SpecialCells(xlCellTypeConstants).count - helperCol.SpecialCells(xlCellTypeConstants).count
    End With
    helperCol.ClearContents
End With

If count >= 1 Then
    Range(count, "G") =   " Duplicate/s found"
End If

输出应如下所示:-(粗体字仅由我完成,仅是为了清楚地了解其不是必需的)

output should look like :- (Bold font is done by me only just for clear understanding its not required )

推荐答案

此代码将在"F"列中包含重复项的任何单元格的右侧(即,"G"列)对面的单元格1中产生找到重复项"

This code will produce "Duplicate Found" in the cell 1 across to the right (i.e. column "G") of any cells in column "F" with duplicates.

Option Explicit

Sub Test()

    Dim CEL As Range, RANG As Range

    With Worksheets("Sheet1")

        ' Build a range (RANG) between cell F2 and the last cell in column F
        Set RANG = Range(.Cells(2, "F"), .Cells(.Rows.Count, "F").End(xlUp))

    End With

    ' For each cell (CEL) in this range (RANG)
    For Each CEL In RANG

        ' If the count of CEL in RANG is greater than 1, then set the value of the cell 1 across to the right of CEL (i.e. column G) as "Duplicate Found"
        If Application.WorksheetFunction.CountIf(RANG, CEL.Value) > 1 Then CEL.Offset(, 1).Value = "Duplicate Found"

    Next CEL

End Sub

另一个选择是使用字典(首先添加对Microsoft Scripting Runtime的引用),该字典存储唯一值及其范围.在向下移动范围时,将填满字典,如果值已经存在,则对于原始范围和所有后续出现的位置,记录找到重复项".

Another option is to use a Dictionary (first add reference to Microsoft Scripting Runtime), which stores unique values and their ranges. As you progress down the range, you fill up the Dictionary, and if a value already exists, then for the original range and all subsequent occurrences, record "Duplicate found".

工具>参考

Sub Test2()

    Dim CEL As Range, RANG As Range
    Dim dict As New Scripting.Dictionary

    With Worksheets("Sheet1")

        ' Build a range (RANG) between cell F2 and the last cell in column F
        Set RANG = Range(.Cells(2, "F"), .Cells(.Rows.Count, "F").End(xlUp))

    End With

    ' For each cell (CEL) in this range (RANG)
    For Each CEL In RANG

        If CEL.Value <> "" Then ' ignore blank cells

            If Not dict.Exists(CEL.Value) Then ' if the value hasn't been seen yet
                dict.Add CEL.Value, CEL ' add the value and first-occurrence-of-value-cell to the dictionary
            Else ' if the value has already been seen
                CEL.Offset(, 1).Value = "Duplicate Found" ' set the value of the cell 1 across to the right of CEL (i.e. column G) as "Duplicate Found"
                dict(CEL.Value).Offset(, 1).Value = "Duplicate Found" ' set the value of the cell 1 across to the right of first-occurrence-of-value-cell (i.e. column G) as "Duplicate Found"
            End If

        End If

    Next CEL

    Set dict = Nothing

End Sub

理论上,这应该更快,因为它在整个范围内执行的迭代次数更少; Countif 函数检查整个范围是否与每个单元格匹配,即1百万个单元格x 1百万个单元格.但是我不确定 Dictionary 对象的价格是多少.对于这种方法,字典对象会随着您检查每个单元格而增长,因此随后的访问可能会变慢,但是与重新检查每个单元格相比,它的开销应该仍然更低;另外,Dictionary对象只能与唯一值的数量一样大.

In theory, this should be faster as it does less iterations over the entire range; Countif function checks entire range for matches with each cell, i.e. 1 million cells x 1 million cells. But I'm unsure how expensive the Dictionary object is. For this method, the Dictionary object grows as you check each cell, so subsequent accesses might become slower, but this should still be less expensive than checking every cell again; additionally, the Dictionary object can only grow as large as the number of unique values.

对于其他性能提升:

  1. 在代码开头设置:

  1. Set at the start of the code:

Application.ScreenUpdating = False
Application.Calculation = xlManual

  • 在代码末尾还原:

  • Restore at the end of the code:

    Application.Calculation = xlAutomatic
    Application.ScreenUpdating = True
    

  • 这篇关于在列中查找重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆