从对象数组中删除重复项 [英] Remove duplicates from array of objects

查看：135 发布时间：2017/7/21 1:10:00 c# arrays class duplicates

本文介绍了从对象数组中删除重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个名叫的客户，它有几个字符串属性，如

  firstName，lastName，email等等

我从 csv 创建类的数组的文件：

 客户[]客户

我需要删除具有相同电子邮件地址的重复客户，只留下1个每个特定电子邮件地址的客户记录。

我已经使用2个循环完成了这个操作，但是通常需要50,000个以上的客户记录需要5分钟。完成删除重复操作后，我需要将客户信息写入另一个csv文件（此处无需帮助）。

如果我在循环中做了一个 Distinct ，我将如何删除作为其中一部分的其他字符串变量

谢谢，
Andrew

解决方案

div>

使用 Linq ，可以在O（n）时间（单级循环）中使用 GroupBy

  var uniquePersons = persons.GroupBy（p => p.Email）
 .Select（grp = > grp.First（））
 .ToArray（）;

更新

有点在 O（n）行为 GroupBy 。

GroupBy 在 Linq （ Enumerable.cs ）这样 -

IEnumerable 仅迭代一次才能创建分组。提供的密钥（例如电子邮件）的哈希用于查找唯一的密钥，元素将添加到分组对应的密钥。

请看这个 GetGrouping 代码。和一些旧帖子供参考。

GroupBy操作的渐近复杂度是多少？

LINQ方法的运行时复杂度（Big-O）有什么保证？ / a>

然后选择显然是一个O（n）整体上述代码 O（n）

更新2

处理 code> / null 值。

 
 
 所以，如果有 Email 的值是 null 或空，简单的 GroupBy 将仅从 null &  
 
 
将所有这些对象与 null  / 空值值是在运行时使用这些对象的一些唯一键，如
  var tempEmailIndex = 0; 
 var uniqueNullAndEmpty = persons 
 .GroupBy（p => string.IsNullOrEmpty（p.Email）
？（++ tempEmailIndex）.ToString（）：p.Email）
 。选择（grp => grp.First（））
 .ToArray（）; 
  
 
I have a class called Customer that has several string properties like
firstName, lastName, email, etc.  
I read in the customer information from a csv file that creates an array of the class: 
Customer[] customers  
I need to remove the duplicate customers having the same email address, leaving only 1 customer record for each particular email address.  

I have done this using 2 loops but it takes nearly 5 minutes as there are usually 50,000+ customer records.  Once I am done removing the duplicates, I need to write the customer information to another csv file (no help needed here).  

If I did a Distinct in a loop how would I remove the other string variables that are a part of the class for that particular customer as well?

Thanks,
Andrew
 解决方案 
With Linq, you can do this in O(n) time (single level loop) with a GroupBy
var uniquePersons = persons.GroupBy(p => p.Email)
                           .Select(grp => grp.First())
                           .ToArray();
Update

A bit on O(n) behavior of GroupBy.

GroupBy is implemented in Linq (Enumerable.cs) as this -

The IEnumerable is iterated only once to create the grouping. A Hash of the key provided (e.g. "Email" here) is used to find unique keys, and the elements are added in the Grouping corresponding to the keys.

Please see this GetGrouping code. And some old posts for reference.


What's the asymptotic complexity of GroupBy operation?
What guarantees are there on the run-time complexity (Big-O) of LINQ methods?


Then Select is obviously an O(n) code, making the above code O(n) overall. 

Update 2

To handle empty/null values.  

So, if there are instances where the value of Email is null or empty, the simple GroupBy will take just one of those objects from null & empty each.

One quick way to include all those objects with null/empty value is to use some unique keys at the run time for those objects, like
var tempEmailIndex = 0;
var uniqueNullAndEmpty = persons
                         .GroupBy(p => string.IsNullOrEmpty(p.Email) 
                                       ? (++tempEmailIndex).ToString() : p.Email)
                         .Select(grp => grp.First())
                         .ToArray();


                        
这篇关于从对象数组中删除重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从对象数组中删除重复项 [英] Remove duplicates from array of objects

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

从对象数组中删除重复项 [英] Remove duplicates from array of objects

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭