找重复的两列LINQ [英] Get duplicates for two columns with LINQ

查看:236
本文介绍了找重复的两列LINQ的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

LINQ要把我逼疯了。为什么下面的查询不返回重复的,而它的工作原理,只有一个标识?哪里是我的错误?

LINQ drives me crazy. Why does following query not return the duplicates, whereas it works with only one identifier? Where is my error?

' generate some test-data '
Dim source As New DataTable
source.Columns.Add(New DataColumn("RowNumber", GetType(Int32)))
source.Columns.Add(New DataColumn("Value1", GetType(Int32)))
source.Columns.Add(New DataColumn("Value2", GetType(Int32)))
source.Columns.Add(New DataColumn("Text", GetType(String)))
Dim rnd As New Random()
For i As Int32 = 1 To 100
    Dim newRow = source.NewRow
    Dim value = rnd.Next(1, 20)
    newRow("RowNumber") = i
    newRow("Value1") = value
    newRow("Value2") = (value + 1)
    newRow("Text") = String.Format("RowNumber{0}-Text", i)
    source.Rows.Add(newRow)
Next
' following query does not work, it always has Count=0 '
' although it works with only one identifier '
Dim dupIdentifiers = From row In source
         Group row By grp = New With {.Val1 = row("Value1"), .Val2 = row("Value2")}
         Into Group
         Where Group.Count > 1
         Select idGroup = New With {grp.Val1, grp.Val2, Group.Count}


修改:下面是完整的解决方案,这要归功于<一个href="http://stackoverflow.com/questions/7530906/get-duplicates-for-two-columns-with-linq/7530944#7530944">@Jon飞碟双向的答案:)


Edit: Following is the complete solution, thanks to @Jon Skeet's answer :)

Dim dupKeys = From row In source
        Group row By grp = New With {Key .Val1 = CInt(row("Value1")), Key .Val2 = CInt(row("Value2"))}
        Into Group Where Group.Count > 1
        Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

Dim dupRows = From row In source
        Join dupKey In dupKeys 
        On row("RowNumber") Equals dupKey 
        Select row

If dupRows.Any Then
    ' create a new DataTable from the first duplicate rows '
    Dim dest = dupRows.CopyToDataTable
End If

与分组的主要问题是,我必须让他们属性。 在我上面的code接下来的问题是,从原始表中得到重复的行。 因为几乎所有的行具有重复(根据两个字段),则结果数据表中包含99的100行,而不是仅在19重复值。我需要只选择第一个重复的行,并加入他们的PK原始表。

The main problem with grouping was that i must make them key properties. The next problem in my above code was to get the duplicate rows from the original table. Because nearly every row has a duplicate(according to two fields), the result DataTable contained 99 of 100 rows and not only the 19 duplicate values. I needed to select only the first duplicate row and join them with the original table on the PK.

Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

虽然这部作品在我的情况下,也许有人可以解释我如何从原始表中只选择重复的,如果我将不得不只组合键。

Although this works in my case, maybe someone can explain me how to select only the duplicates from the original table if i would have had only composite keys.

修改:我心中已经回答了这个问题的最后一部分我自己,所以这里是我需要的:

Edit: I'v answered the last part of the question myself, so here is all i need:

Dim dups = From row In source
         Group By grp = New With {Key .Value1 = CInt(row("Value1")), Key .Value2 = CInt(row("Value2"))}
         Into Group Where Group.Count > 1
         Let Text = Group.First.Item("Text")
         Select Group.First

If dups.Any Then
      Dim dest = dups.CopyToDataTable
End If

我所需要的<一个href="http://stackoverflow.com/questions/641931/is-linqs-let-keyword-better-than-its-into-keyword/641944#641944">Let-Keyword为了保持另一列(多个)到相同的上下文,并返回分组的DUP只有第一行。在这条路上,我可以使用 CopyToDataTable 以创建从重复行的数据表。

I needed the Let-Keyword in order to keep the other column(s) into the same context and return only the first row of the grouped dups. On this way i can use CopyToDataTable to create a DataTable from the duplicate rows.

只有几行(我能救第二个查询找到原始表中的行)找到多个列的重复和创建它们的数据表。

Only a few lines of code overall (i can save the second query to find the rows in the original table) to find duplicates on multiple columns and create a DataTable of them.

推荐答案

这个问题是这样的匿名类型的工作在VB中 - 他们是可变的默认;仅属性包含的散列和平等。试试这个:

The problem is the way anonymous types work in VB - they're mutable by default; only Key properties are included for hashing and equality. Try this:

Group row By grp = New With {Key .Val1 = row("Value1"), Key .Val2 = row("Value2")}

(在C#这不会是一个问题 - 匿名类型在C#中始终不变的所有属性)

(In C# this wouldn't be a problem - anonymous types in C# are always immutable in all properties.)

这篇关于找重复的两列LINQ的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆