从对象数组中删除重复项 [英] Remove duplicates from array of objects
问题描述
我有一个名叫的客户
,它有几个字符串属性,如
firstName,lastName,email等等
我从 csv
创建类的数组的文件:
客户[]客户
我需要删除具有相同电子邮件地址的重复客户,只留下1个每个特定电子邮件地址的客户记录。
我已经使用2个循环完成了这个操作,但是通常需要50,000个以上的客户记录需要5分钟。完成删除重复操作后,我需要将客户信息写入另一个csv文件(此处无需帮助)。
如果我在循环中做了一个 Distinct
,我将如何删除作为其中一部分的其他字符串变量
谢谢,
Andrew
使用 Linq
,可以在O(n)时间(单级循环)中使用 GroupBy
var uniquePersons = persons.GroupBy(p => p.Email)
.Select(grp = > grp.First())
.ToArray();
更新
有点在 O(n)
行为 GroupBy
。
GroupBy
在 Linq
( Enumerable.cs
)这样 -
IEnumerable
仅迭代一次才能创建分组。提供的密钥(例如电子邮件)的哈希
用于查找唯一的密钥,元素将添加到分组
对应的密钥。
请看这个 GetGrouping 代码。和一些旧帖子供参考。
然后选择
显然是一个O(n)整体上述代码 O(n)
更新2
处理 code> /
null
值。
所以,如果有 Email
的值是 null
或空
,简单的 GroupBy
将仅从 null
&
将所有这些对象与 null
/ 空值
值是在运行时使用这些对象的一些唯一键,如
var tempEmailIndex = 0;
var uniqueNullAndEmpty = persons
.GroupBy(p => string.IsNullOrEmpty(p.Email)
?(++ tempEmailIndex).ToString():p.Email)
。选择(grp => grp.First())
.ToArray();
I have a class called Customer
that has several string properties like
firstName, lastName, email, etc.
I read in the customer information from a csv
file that creates an array of the class:
Customer[] customers
I need to remove the duplicate customers having the same email address, leaving only 1 customer record for each particular email address.
I have done this using 2 loops but it takes nearly 5 minutes as there are usually 50,000+ customer records. Once I am done removing the duplicates, I need to write the customer information to another csv file (no help needed here).
If I did a Distinct
in a loop how would I remove the other string variables that are a part of the class for that particular customer as well?
Thanks, Andrew
With Linq
, you can do this in O(n) time (single level loop) with a GroupBy
var uniquePersons = persons.GroupBy(p => p.Email)
.Select(grp => grp.First())
.ToArray();
Update
A bit on O(n)
behavior of GroupBy
.
GroupBy
is implemented in Linq
(Enumerable.cs
) as this -
The IEnumerable
is iterated only once to create the grouping. A Hash
of the key provided (e.g. "Email" here) is used to find unique keys, and the elements are added in the Grouping
corresponding to the keys.
Please see this GetGrouping code. And some old posts for reference.
- What's the asymptotic complexity of GroupBy operation?
- What guarantees are there on the run-time complexity (Big-O) of LINQ methods?
Then Select
is obviously an O(n) code, making the above code O(n)
overall.
Update 2
To handle empty
/null
values.
So, if there are instances where the value of Email
is null
or empty
, the simple GroupBy
will take just one of those objects from null
& empty
each.
One quick way to include all those objects with null
/empty
value is to use some unique keys at the run time for those objects, like
var tempEmailIndex = 0;
var uniqueNullAndEmpty = persons
.GroupBy(p => string.IsNullOrEmpty(p.Email)
? (++tempEmailIndex).ToString() : p.Email)
.Select(grp => grp.First())
.ToArray();
这篇关于从对象数组中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!