在Datagridview中查找重复项 [英] Find duplicates in a Datagridview
问题描述
我想在dgv中搜索重复项,并在列表中收集重复项的行号(必要时将其显示给用户)。这是我的代码:
I want to search a dgv for duplicates and collect the row numbers of the duplicates in a list (to show it to the user if necessary). this is my code:
Function Check(ByVal dgv As DataGridView)
Dim Duplicates As New List(Of Tuple(Of Integer, Integer))
For i As Integer = 1 To dgv.RowCount
For k As Integer = 1 To dgv.RowCount
For j As Integer = 1 To dgv.ColumnCount
Dim l As Integer
If dgv.Rows(i).Cells(j).Value = dgv.Rows(k + 1).Cells(j).Value Then
l += l + 1
If l = dgv.ColumnCount Then
Duplicates.Add(Tuple.Create(i, k))
End If
End If
Next
Next
Next
Return Duplicates
End Function
现在我实际上有两个问题:
Now I have actually two questions:
-
因为我是初学者,所以我想知道如果这是最好的方法重复的拱形
Since I am a beginner I would like to know if this is the best way to search for duplicats
我总是收到以下错误:未为类型'DBNull'定义运算符'='。我知道错误,但不知道如何处理。我尝试过:
I always get the error that Operator '=' is not defined for type 'DBNull'
. I know the error but dont know how to handle it. I tried:
Dim l As Integer
'DbNull - Check
Dim first As String
If IsDBNull(dgv.Rows(i).Cells(j).Value) Then
first = 0
Else
first = dgv.Rows(i).Cells(j).Value
End If
现在我检查了如果 first = second
而不是 dgv.Rows(i).Cells(j).Value = dgv.Rows(k +1).Cells( j).Value
但现在我遇到类型问题,因为db-types是Date,varchar,integer等,这使我与 dim首先冲突为字符串
。有人知道消除错误的方法吗?
Now I checked if first = second
instead of dgv.Rows(i).Cells(j).Value = dgv.Rows(k + 1).Cells(j).Value
But now I have a type problem, since the db-types are Date, varchar, integer and so on and this gives me conflicts with dim first as string
. Anyone knows a way to get rid of the error?
其他信息:
我的dgv
-绑定到与sql server
-有6个可见列和4个不可见列
Additional information: My dgv - is bound to a datatable which is connected to a sql server - has 6 visible and 4 invisible columns
推荐答案
-
您需要在基础数据表中查找重复项,而不是在DataGridView本身中查找重复项。
You need to be finding duplicates in the underlying data table, not the DataGridView itself.
您当前的方法效率低下,因为它会为每行循环所有其他行- O(N ^ 2)。可以使用字典(字符串,行)优化一次通过。
Your current approach is inefficient, because it loops all other rows for every row - O(N^2). It is possible to optimize to one pass using a dictionary(of string, row).
这里是代码示例来说明这个想法。请注意 keyColumns
哈希集如何用于指定应使用哪些列来确定重复项(也称为唯一键)。
Here is a code sample to illustrate the idea. Notice how keyColumns
hashset is used to specify which columns should be used to determine the duplicate (also known as unique key).
Dim dt As New DataTable
dt.Columns.Add("col1")
dt.Columns.Add("col2")
dt.Columns.Add("col3")
dt.Rows.Add({"val1", "val2", "val3"})
dt.Rows.Add({"val1", "val3", "val3"})
dt.Rows.Add({"val1", "val3", "val4"})
Dim dict As New Dictionary(Of String, List(Of DataRow))
Dim keyColumns As New HashSet(Of String)({"col1", "col3"})
For Each dr As DataRow In dt.Rows
Dim sbKey As New System.Text.StringBuilder
For Each col As DataColumn In dt.Columns
Dim colName As String = col.ColumnName
If Not keyColumns.Contains(colName) Then Continue For
Dim colValue As String = dr.Field(Of String)(colName)
sbKey.Append(colValue & "@")
Next
Dim key As String = sbKey.ToString
Dim drList As List(Of DataRow) = Nothing
If Not dict.TryGetValue(key, drList) Then
drList = New List(Of DataRow)
dict.Add(key, drList)
End If
drList.Add(dr)
Next
最后,您的 dict
包含一个按键组织的所有数据行的字典。每个密钥中只有1个条目的密钥没有重复项。其他是重复的。您可以对其进行进一步调整,以仅查找具有N个重复项(N> = 1)的行,例如:
In the end, your dict
contains a dictionary of all data rows, organized by key. Those which have 1 entry in each key do not have duplicates. Others are duplicates. You can tweak it further to find only rows which have N duplicates (N>=1), so this, for example:
Dim p = dict.Where(Function(x) x.Value.Count > 1)
将找到所有数据行的子集,至少找到一个重复项,并包括所有冲突的行(包括原始行)。
Will find you the subset of all data rows, for which at least one duplicate was found, and include all conflicting ones (including the original).
这篇关于在Datagridview中查找重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!