TreeSet比较器在某些情况下无法删除重复项? [英] TreeSet Comparator failed to remove duplicates in some cases?

查看:88
本文介绍了TreeSet比较器在某些情况下无法删除重复项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的TreeSet具有以下比较器:

I have the following comparator for my TreeSet:

public class Obj {
    public int id;
    public String value;
    public Obj(int id, String value) {
        this.id = id;
        this.value = value;
    }
    public String toString() {
        return "(" + id + value + ")";
    }
}

Obj obja = new Obj(1, "a");
Obj objb = new Obj(1, "b");
Obj objc = new Obj(2, "c");
Obj objd = new Obj(2, "a");
Set<Obj> set = new TreeSet<>((a, b) -> {
    System.out.println("Comparing " + a + " and " + b);
    int result = a.value.compareTo(b.value);
    if (a.id == b.id) {
        return 0;
    }
    return result == 0 ? Integer.compare(a.id, b.id) : result;
});
set.addAll(Arrays.asList(obja, objb, objc, objd));
System.out.println(set);

它打印出[(1a),(2c)],删除了重复项.

It prints out [(1a), (2c)], which removed the duplicates.

但是当我将最后一个Integer.compare更改为Integer.compare(b.id, a.id)时(即切换a和b的位置),它会打印出[(2a),(1a),(2c)].显然,相同的ID 2出现了两次.

But when I changed the last Integer.compare to Integer.compare(b.id, a.id) (i.e. switched the positions of a and b), it prints out [(2a), (1a), (2c)]. Clearly the same id 2 appeared twice.

如何修复比较器,使其始终根据ID删除重复项,并根据值(升序)然后ID(降序)对有序集进行排序?

How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?

推荐答案

您在询问:
如何固定比较器,使其始终根据ID删除重复项,并根据值(升序)然后ID(降序)对有序集进行排序?

You're askimg:
How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?

您希望比较器

  1. 根据Obj.id
  2. 删除重复项
  3. Obj.valueObj.id
  4. 对集合进行排序
  1. remove duplicates based on Obj.id
  2. sort the set by Obj.value and Obj.id

要求1)导致

Function<Obj, Integer> byId = o -> o.id;
Set<Obj> setById = new TreeSet<>(Comparator.comparing(byId));

要求2)导致

Function<Obj, String> byValue = o -> o.value;
Comparator<Obj> sortingComparator =  Comparator.comparing(byValue).thenComparing(Comparator.comparing(byId).reversed());
Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);

我们来看看 JavaDoc TreeSet的a>.它说:

Let's have a look on the JavaDoc of TreeSet. It says:

请注意,如果要对集合[...]进行维护,则必须与equals保持一致. 正确实现Set接口.就是这样 因为Set接口是根据equals操作定义的, 但是TreeSet实例使用其实例执行所有元素比较 compareTo(或比较)方法,因此两个元素被视为相等 从集合的角度来看,这种方法是相等的.

Note that the ordering maintained by a set [...] must be consistent with equals if it is to correctly implement the Set interface. This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal.

将根据比较器对集合进行排序,但还会使用比较器比较其元素的相等性.

The set will be ordered according to the comparator but its elements are also compared for equality using the comparator.

据我所知,无法定义同时满足这两个要求的Comparator.因为首先是TreeSet,所以必须匹配Set要求1).要达到要求2),您可以创建第二个TreeSet:

As far as I can see there is no way to define a Comparator which satisfies both requirements. Since a TreeSet is in the first place a Set requirement 1) has to match. To achieve requirement 2) you can create a second TreeSet:

Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);
setByValueAndId.addAll(setById);

或者,如果您不需要集合本身,而是以所需顺序处理元素,则可以使用Stream:

Or if you don't need the set itself but to process the elements in the desired order you can use a Stream:

Consumer<Obj> consumer = <your consumer>;
setById.stream().sorted(sortingComparator).forEach(consumer);

顺便说一句:
虽然可以根据给定的ComparatorStream的元素进行排序,但是没有distinct方法采用Comparator来删除重复项.

BTW:
While it's possible to sort the elements of a Stream according to a given Comparator there is no distinct method taking a Comparator to remove duplicates according to it.


您有两个不同的任务:1.重复删除,2.排序.一个Comparator无法解决这两项任务.那有什么选择呢?


You have two different tasks: 1. duplicate removal, 2. sorting. One Comparator cannot solve both tasks. So what alternatives are there?

您可以覆盖Obj上的equalshashCode.然后,可以使用HashSetStream删除重复项.
对于排序,您仍然需要Comparator(如上所示).根据Comparable

You can override equals and hashCode on Obj. Then a HashSet or a Stream can be used to remove duplicates.
For the sorting you still need a Comparator (as shown above). Implementing Comparable just for sorting would result in an ordering which is not "consistent with equals" according to Comparable JavaDoc.

由于Stream可以解决这两个任务,所以这是我的选择.首先,我们覆盖hashCodeequals以通过id标识重复项:

Since a Stream can solve both tasks, it would be my choice. First we override hashCode and equals to identify duplicates by id:

public int hashCode() {
    return Integer.hashCode(id);
}

@Override
public boolean equals(Object obj) {
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;
    Obj other = (Obj) obj;
    if (id != other.id)
        return false;
    return true;
}

现在我们可以使用Stream:

// instantiating one additional Obj and reusing those from the question
Obj obj3a = new Obj(3, "a");

// reusing sortingComparator from the code above
Set<Obj> set = Stream.of(obja, objb, objc, objd, obj3a)
        .distinct()
        .sorted(sortingComparator)
        .collect(Collectors.toCollection(LinkedHashSet::new));

System.out.println(set); // [(3a), (1a), (2c)]

返回的LinkedHashSet具有Set的语义,但也保留了sortingComparator的顺序.

The returned LinkedHashSet has the semantics of a Set but it also preserved the ordering of sortingComparator.

编辑(回答评论中的问题)

EDIT (answering the questions from comments)

问:为什么无法正确完成工作?
自己看看.像下面一样更改Comparator的最后一行

Q: Why it didn't finish the job correctly?
See it for yourself. Change the last line of your Comparator like follows

int r = result == 0 ? Integer.compare(a.id, b.id) : result;
System.out.println(String.format("a: %s / b: %s / result: %s -> %s", a.id, b.id, result, r));
return r;

运行一次代码,然后切换Integer.compare的操作数.开关导致不同的比较路径.区别在于(2a)(1a)的比较.

Run the code once and then switch the operands of Integer.compare. The switch results in a different comparing path. The difference is when (2a) and (1a) are compared.

在第一次运行中,(2a)大于(1a),因此将其与下一个条目(2c)进行比较.这导致相等-找到重复项.

In the first run (2a) is greater than (1a) so it's compared with the next entry (2c). This results in equality - a duplicate is found.

在第二次运行中,(2a)小于(1a).因此,(2a)将与下一个条目进行比较.但是(1a)已经是最小的条目,并且没有以前的条目.因此,没有找到(2a)的重复项,并且已将其添加到集合中.

In the second run (2a) is smaller than (1a). Thus (2a) would be compared as next with a previous entry. But (1a) is already the smallest entry and there is no previous one. Hence no duplicate is found for (2a) and it's added to the set.

问:您说一个比较器不能完成两个任务,而我的第一个比较器实际上正确地完成了两个任务.
是的-但仅适用于给定的示例.像我一样将Obj obj3a添加到集合中并运行您的代码.返回的排序集是:

Q: You said one comparator can't finish two tasks, my 1st comparators in fact did both tasks correctly.
Yes - but only for the given example. Add Obj obj3a to the set as I did and run your code. The returned sorted set is:

[(1a), (3a), (2c)]

这违反了您对按id降序的相等value进行排序的要求.现在它由id上升.运行我的代码,它返回正确的顺序,如上所示.

This violates your requirement to sort for equal values descending by id. Now it's ascending by id. Run my code and it returns the right order, as shown above.

前段时间在Comparator上挣扎,我得到以下评论:"...这是一个很棒的练习,展示了手动比较器实现有多棘手……"(

Struggling with a Comparator a time ago I got the following comment: "... it’s a great exercise, demonstrating how tricky manual comparator implementations can be ..." (source)

这篇关于TreeSet比较器在某些情况下无法删除重复项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆