从Java 8中的列表中提取重复的对象 [英] Extract duplicate objects from a List in Java 8

查看:265
本文介绍了从Java 8中的列表中提取重复的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此代码从原始列表中删除了重复项,但我想从原始列表中提取重复项->不删除它们(此程序包名称只是另一个项目的一部分):

This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):

给出:

一个人pojo:

package at.mavila.learn.kafka.kafkaexercises;

import org.apache.commons.lang3.builder.ToStringBuilder;

public class Person {

private final Long id;
private final String firstName;
private final String secondName;


private Person(final Builder builder) {
    this.id = builder.id;
    this.firstName = builder.firstName;
    this.secondName = builder.secondName;
}


public Long getId() {
    return id;
}

public String getFirstName() {
    return firstName;
}

public String getSecondName() {
    return secondName;
}

public static class Builder {

    private Long id;
    private String firstName;
    private String secondName;

    public Builder id(final Long builder) {
        this.id = builder;
        return this;
    }

    public Builder firstName(final String first) {
        this.firstName = first;
        return this;
    }

    public Builder secondName(final String second) {
        this.secondName = second;
        return this;
    }

    public Person build() {
        return new Person(this);
    }


}

@Override
public String toString() {
    return new ToStringBuilder(this)
            .append("id", id)
            .append("firstName", firstName)
            .append("secondName", secondName)
            .toString();
}
}

重复提取代码.

请注意,这里我们过滤了ID和名字以检索新列表,我在其他地方而不是我的地方看到了这段代码:

Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:

package at.mavila.learn.kafka.kafkaexercises;

import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;

import static java.util.Objects.isNull;

public final class DuplicatePersonFilter {


private DuplicatePersonFilter() {
    //No instances of this class
}

public static List<Person> getDuplicates(final List<Person> personList) {

   return personList
           .stream()
           .filter(duplicateByKey(Person::getId))
           .filter(duplicateByKey(Person::getFirstName))
           .collect(Collectors.toList());

}

private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) {
    Map<Object,Boolean> seen = new ConcurrentHashMap<>();
    return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));

}

}

测试代码. 如果运行此测试用例,您将获得[alex,lolita,elpidio,romualdo].

The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].

我希望得到[romualdo,otroRomualdo]作为给定id和firstName的提取重复项:

I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:

package at.mavila.learn.kafka.kafkaexercises;


import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;

import static org.junit.Assert.*;

public class DuplicatePersonFilterTest {

private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);



@Test
public void testList(){

    Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
    Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
    Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
    Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
    Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();


    List<Person> personList = new ArrayList<>();

    personList.add(alex);
    personList.add(lolita);
    personList.add(elpidio);
    personList.add(romualdo);
    personList.add(otroRomualdo);

    final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList);

    LOGGER.info("Duplicates: {}",duplicates);

}

}

在我的工作中,我可以通过使用TreeMap和ArrayList的Comparator获得所需的结果,但这是先创建一个列表,然后对其进行过滤,然后将过滤器再次传递给新创建的列表,这看起来很肿,(和可能效率不高)

In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)

有人有更好的主意如何提取重复项吗?而不是将其删除.

Does someone has a better idea how to extract duplicates?, not remove them.

先谢谢了.

更新:

感谢大家的回答

要使用具有uniqueAttributes的相同方法删除重复项:

To remove the duplicate using same approach with the uniqueAttributes:

 public static List<Person> removeDuplicates(final List<Person> personList) {

    return personList.stream().collect(Collectors
            .collectingAndThen(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(
                    PersonListFilters::uniqueAttributes))),
                    ArrayList::new));

}

 private static String uniqueAttributes(Person person){

    if(Objects.isNull(person)){
        return StringUtils.EMPTY;
    }



    return (person.getId()) + (person.getFirstName()) ;
}

推荐答案

要识别重复项,我知道的任何方法都不比

To indentify duplicates, no method I know of is better suited than Collectors.groupingBy(). This allows you to group the list into a map based on a condition of your choice.

您的情况是idfirstName的组合.让我们将此部分提取到Person中的自己的方法中:

Your condition is a combination of id and firstName. Let's extract this part into an own method in Person:

String uniqueAttributes() {
  return id + firstName;
}

getDuplicates()方法现在非常简单:

public static List<Person> getDuplicates(final List<Person> personList) {
  return getDuplicatesMap(personList).values().stream()
      .filter(duplicates -> duplicates.size() > 1)
      .flatMap(Collection::stream)
      .collect(Collectors.toList());
}

private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
  return personList.stream().collect(groupingBy(Person::uniqueAttributes));
}

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆