从对象数组中删除重复项 [英] Remove duplicates from array of objects

查看:135
本文介绍了从对象数组中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名叫的客户,它有几个字符串属性,如

  firstName,lastName,email等等

我从 csv 创建类的数组的文件:

 客户[]客户

我需要删除具有相同电子邮件地址的重复客户,只留下1个每个特定电子邮件地址的客户记录。



我已经使用2个循环完成了这个操作,但是通常需要50,000个以上的客户记录需要5分钟。完成删除重复操作后,我需要将客户信息写入另一个csv文件(此处无需帮助)。



如果我在循环中做了一个 Distinct ,我将如何删除作为其中一部分的其他字符串变量



谢谢,
Andrew

解决方案

div>

使用 Linq ,可以在O(n)时间(单级循环)中使用 GroupBy

  var uniquePersons = persons.GroupBy(p => p.Email)
.Select(grp = > grp.First())
.ToArray();

更新



有点在 O(n)行为 GroupBy



GroupBy Linq Enumerable.cs )这样 -



IEnumerable 仅迭代一次才能创建分组。提供的密钥(例如电子邮件)的哈希用于查找唯一的密钥,元素将添加到分组对应的密钥。



请看这个 GetGrouping 代码。和一些旧帖子供参考。





然后选择显然是一个O(n)整体上述代码 O(n)



更新2



处理 code> / null 值。



所以,如果有 Email 的值是 null ,简单的 GroupBy 将仅从 null &

将所有这些对象与 null / 空值值是在运行时使用这些对象的一些唯一键,如

  var tempEmailIndex = 0; 
var uniqueNullAndEmpty = persons
.GroupBy(p => string.IsNullOrEmpty(p.Email)
?(++ tempEmailIndex).ToString():p.Email)
。选择(grp => grp.First())
.ToArray();


I have a class called Customer that has several string properties like

firstName, lastName, email, etc.  

I read in the customer information from a csv file that creates an array of the class:

Customer[] customers  

I need to remove the duplicate customers having the same email address, leaving only 1 customer record for each particular email address.

I have done this using 2 loops but it takes nearly 5 minutes as there are usually 50,000+ customer records. Once I am done removing the duplicates, I need to write the customer information to another csv file (no help needed here).

If I did a Distinct in a loop how would I remove the other string variables that are a part of the class for that particular customer as well?

Thanks, Andrew

解决方案

With Linq, you can do this in O(n) time (single level loop) with a GroupBy

var uniquePersons = persons.GroupBy(p => p.Email)
                           .Select(grp => grp.First())
                           .ToArray();

Update

A bit on O(n) behavior of GroupBy.

GroupBy is implemented in Linq (Enumerable.cs) as this -

The IEnumerable is iterated only once to create the grouping. A Hash of the key provided (e.g. "Email" here) is used to find unique keys, and the elements are added in the Grouping corresponding to the keys.

Please see this GetGrouping code. And some old posts for reference.

Then Select is obviously an O(n) code, making the above code O(n) overall.

Update 2

To handle empty/null values.

So, if there are instances where the value of Email is null or empty, the simple GroupBy will take just one of those objects from null & empty each.

One quick way to include all those objects with null/empty value is to use some unique keys at the run time for those objects, like

var tempEmailIndex = 0;
var uniqueNullAndEmpty = persons
                         .GroupBy(p => string.IsNullOrEmpty(p.Email) 
                                       ? (++tempEmailIndex).ToString() : p.Email)
                         .Select(grp => grp.First())
                         .ToArray();

这篇关于从对象数组中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆