在列表中查找忽略字段的重复项 [英] Finding duplicates in a List ignoring a field
问题描述
我有一个列表
的人,我想查找重复的条目,除了 id
之外的所有字段。所以使用 equals()
-method(因此 List.contains()
),因为它们取code> id 考虑。
I've got a List
of Persons and I want to find duplicate entries, consindering all fields except id
. So using the equals()
-method (and in consequence List.contains()
), because they take id
into consideration.
public class Person {
private String firstname, lastname;
private int age;
private long id;
}
修改 equals()
和 hashCode()
-methods忽略 id
字段不是一个选项,因为代码的其他部分依赖在这个。
Modifying the equals()
and hashCode()
-methods to ignore the id
field are not an option because other parts of the code rely on this.
如果我想忽略 id
,Java中最有效的方法是整理重复项。字段?
What's the most efficient way in Java to sort out the duplicates if I want to ignore the id
field?
推荐答案
构建 比较器< Person>
来实现自然键排序,基于搜索的重复数据删除。 TreeSet
将为您提供开箱即用的能力。
Build a Comparator<Person>
to implement your natural-key ordering and then use a binary-search based deduplication. TreeSet
will give you this ability out of the box.
Note that Comparator<T>.compare(a, b)
must fulfil the usual antisymmetry, transitivity, consistency and reflexivity requirements or the binary search ordering will fail. You should also make it null-aware (e.g. if the firstname field of one, other or both are null).
您的Person类的一个简单的自然键比较器如下(它是一个静态成员类,如果您有每个字段的访问器没有显示)。
A simple natural-key comparator for your Person class is as follows (it is a static member class as you haven't shown if you have accessors for each field).
public class Person {
public static class NkComparator implements Comparator<Person>
{
public int compare(Person p1, Person p2)
{
if (p1 == null || p2 == null) throw new NullPointerException();
if (p1 == p2) return 0;
int i = nullSafeCompareTo(p1.firstname, p2.firstname);
if (i != 0) return i;
i = nullSafeCompareTo(p1.lastname, p2.lastname);
if (i != 0) return i;
return p1.age - p2.age;
}
private static int nullSafeCompareTo(String s1, String s2)
{
return (s1 == null)
? (s2 == null) ? 0 : -1
: (s2 == null) ? 1 : s1.compareTo(s2);
}
}
private String firstname, lastname;
private int age;
private long id;
}
然后可以使用它来生成唯一的列表。使用 添加
方法,返回 true
,当且仅当元素不存在于集合中时:
You can then use it to generate a unique list. Use the add
method which returns true
if and only if the element didn't already exist in the set:
List<Person> newList = new ArrayList<Person>();
TreeSet<Person> nkIndex = new TreeSet<Person>(new Person.NkComparator());
for (Person p : originalList)
if (nkIndex.add(p)) newList.add(p); // to generate a unique list
或交换此行的最后一行以输出重复的代码
or swap the final line for this line to output duplicates instead
if (nkIndex.add(p)) newList.add(p);
无论你做什么,不要使用删除
在您列举的原始列表中,这就是为什么这些方法将您的独特元素添加到新列表。
Whatever you do, don't use remove
on your original list while you are enumerating it, that's why these methods add your unique elements to a new list.
如果您只对一个唯一的列表感兴趣,想要尽可能少地使用行:
If you are just interested in a unique list, and want to use as few lines as possible:
TreeSet<Person> set = new TreeSet<Person>(new Person.NkComparator());
set.addAll(originalList);
List<Person> newList = new ArrayList<Person>(set);
这篇关于在列表中查找忽略字段的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!