从Java 8中的列表中提取重复的对象 [英] Extract duplicate objects from a List in Java 8
问题描述
此代码从原始列表中删除了重复项,但我想从原始列表中提取重复项->不删除它们(此程序包名称只是另一个项目的一部分):
This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):
给出:
一个人pojo:
package at.mavila.learn.kafka.kafkaexercises;
import org.apache.commons.lang3.builder.ToStringBuilder;
public class Person {
private final Long id;
private final String firstName;
private final String secondName;
private Person(final Builder builder) {
this.id = builder.id;
this.firstName = builder.firstName;
this.secondName = builder.secondName;
}
public Long getId() {
return id;
}
public String getFirstName() {
return firstName;
}
public String getSecondName() {
return secondName;
}
public static class Builder {
private Long id;
private String firstName;
private String secondName;
public Builder id(final Long builder) {
this.id = builder;
return this;
}
public Builder firstName(final String first) {
this.firstName = first;
return this;
}
public Builder secondName(final String second) {
this.secondName = second;
return this;
}
public Person build() {
return new Person(this);
}
}
@Override
public String toString() {
return new ToStringBuilder(this)
.append("id", id)
.append("firstName", firstName)
.append("secondName", secondName)
.toString();
}
}
重复提取代码.
请注意,这里我们过滤了ID和名字以检索新列表,我在其他地方而不是我的地方看到了这段代码:
Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:
package at.mavila.learn.kafka.kafkaexercises;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import static java.util.Objects.isNull;
public final class DuplicatePersonFilter {
private DuplicatePersonFilter() {
//No instances of this class
}
public static List<Person> getDuplicates(final List<Person> personList) {
return personList
.stream()
.filter(duplicateByKey(Person::getId))
.filter(duplicateByKey(Person::getFirstName))
.collect(Collectors.toList());
}
private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));
}
}
测试代码. 如果运行此测试用例,您将获得[alex,lolita,elpidio,romualdo].
The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].
我希望得到[romualdo,otroRomualdo]作为给定id和firstName的提取重复项:
I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:
package at.mavila.learn.kafka.kafkaexercises;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import static org.junit.Assert.*;
public class DuplicatePersonFilterTest {
private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);
@Test
public void testList(){
Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();
List<Person> personList = new ArrayList<>();
personList.add(alex);
personList.add(lolita);
personList.add(elpidio);
personList.add(romualdo);
personList.add(otroRomualdo);
final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList);
LOGGER.info("Duplicates: {}",duplicates);
}
}
在我的工作中,我可以通过使用TreeMap和ArrayList的Comparator获得所需的结果,但这是先创建一个列表,然后对其进行过滤,然后将过滤器再次传递给新创建的列表,这看起来很肿,(和可能效率不高)
In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)
有人有更好的主意如何提取重复项吗?而不是将其删除.
Does someone has a better idea how to extract duplicates?, not remove them.
先谢谢了.
更新:
感谢大家的回答
要使用具有uniqueAttributes的相同方法删除重复项:
To remove the duplicate using same approach with the uniqueAttributes:
public static List<Person> removeDuplicates(final List<Person> personList) {
return personList.stream().collect(Collectors
.collectingAndThen(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(
PersonListFilters::uniqueAttributes))),
ArrayList::new));
}
private static String uniqueAttributes(Person person){
if(Objects.isNull(person)){
return StringUtils.EMPTY;
}
return (person.getId()) + (person.getFirstName()) ;
}
推荐答案
To indentify duplicates, no method I know of is better suited than Collectors.groupingBy()
. This allows you to group the list into a map based on a condition of your choice.
您的情况是id
和firstName
的组合.让我们将此部分提取到Person
中的自己的方法中:
Your condition is a combination of id
and firstName
. Let's extract this part into an own method in Person
:
String uniqueAttributes() {
return id + firstName;
}
getDuplicates()
方法现在非常简单:
public static List<Person> getDuplicates(final List<Person> personList) {
return getDuplicatesMap(personList).values().stream()
.filter(duplicates -> duplicates.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
}
private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
return personList.stream().collect(groupingBy(Person::uniqueAttributes));
}
- 第一行调用另一个方法
getDuplicatesMap()
来创建地图,如上所述. - 然后它流过地图的值,即人的名单.
- 它会过滤掉列表以外的所有内容,但列表大小大于1,即找到重复项.
- 最后,
flatMap()
用于将列表流扁平化为一个单一的人员流,并将该流收集到一个列表中. - The first line calls another method
getDuplicatesMap()
to create the map as explained above. - It then streams over the values of the map, which are lists of persons.
- It filters out everything except lists with a size greater than 1, i.e. it finds the duplicates.
- Finally,
flatMap()
is used to flatten the stream of lists into one single stream of persons, and collects the stream to a list.
另一种选择是,如果您确实具有相同的id
和firstName
来确定人是平等的,则可以使用 Jonathan Johx 解决方案并实现equals()
方法.
An alternative, if you truly identify persons as equal if the have the same id
and firstName
is to go with the solution by Jonathan Johx and implement an equals()
method.
这篇关于从Java 8中的列表中提取重复的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!