了解集合removeAll方法 [英] Insight into Collections removeAll method

查看:147
本文介绍了了解集合removeAll方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大小约200k的列表。我在筛选列表时遇到一些问题。



这是实现:

  public List<> filterList(List<> listToBeFiltered){
List<> removeElementsFromList = listToBeFiltered.parallelStream()
.filter(//一些过滤逻辑)
.collect(Collectors.toList());
listToBeFiltered.removeAll(removeElementsFromList);
return listToBeFiltered;
}

我面对的问题是,程序将保持停留在removeAlementsFromList接近listToBeFiltered的大小时,removeAll语句。非常感谢任何洞察/替代解决方案。

解决方案

问题是 x.removeAll )操作是 O(n×m),其中 是集合的大小 x m 是集合 y 的大小(即 O(| x | x | y | / em>)。



removeAll 方法基本上只是遍历整个列表中的每个元素 y ,检查 x 中的每个元素是否相等,如果是这样,则删除它。



假设你使用的是Java 8,有一个更有效的方法来做到这一点:

  List< Integer> xs = new ArrayList<>(); 
// TODO:用一堆值初始化xs
List< Integer> ys = new ArrayList<>();
// TODO:用一组值初始化ys
Set< Integer> ysSet = new HashSet<>(ys);
List< Integer> xsPrime = xs.stream()
.filter(x - >!ysSet.contains(x))
.collect(Collectors.toList

对于大小为100k的 xs 使用 removeAll 花费大约5500毫秒的大小 66k 的code> ys 使用上面的方法只需要大约8ms。由于 removeAll 的二次复杂性,当您扩展到200k时,我预计差异会更显着。



<相反,上面使用的过滤器版本的复杂性将是 O(n + m),因为它是 O(m) ys 中的所有值,然后 O(n)来遍历所有值的$ c> HashSet xs 以确保新 ysSet 中不包含任何内容。 (这当然假设 HashSet 查找是 O(1)。)



< hr>

再次回顾你的问题,我意识到你已经在使用 filter ...在这种情况下,我建议只是反转过滤器逻辑,然后将传入的列表的值重置为过滤的值:

  public List< filterList(List<> listToBeFiltered){
List<> filteredList = listToBeFiltered.parallelStream()
.filter(/ *一些反向过滤逻辑* /)
.collect(Collectors.toList());
listToBeFiltered.clear();
listToBeFiltered.addAll(filteredList);
return listToBeFiltered;
}

如果您不需要改变原始列表, return filteredList 。 (这将是我的首选解决方案。)






我只是运行我的测试,使用循环而不是流:

  Set< Integer> ysSet = new HashSet<>(ys); 
List< Integer> xsPrime = new ArrayList<>();
for(Integer x:xs){
if(!ysSet.contains(x)){
xsPrime.add(x);
}
}
return xsPrime;

此版本在大约7ms而不是8ms完成。因为这只是速度比流版本(特别是考虑到原始版本使用 removeAll 慢了3个数量级),我会坚持流版本 - 特别是因为你可以利用并行性(正如你已经在使用 parallelStream )。


I have a list of size ~200k..I am facing some issues while filtering the list.

Here is the implementation:

public List<> filterList(List<> listToBeFiltered){
List<> removeElementsFromList = listToBeFiltered.parallelStream()
                                    .filter(//some filtering logic)
                                    .collect(Collectors.toList());
listToBeFiltered.removeAll(removeElementsFromList);
return listToBeFiltered;
}

The problem I face with the code is that the program will remain stuck at the removeAll statement when the removeElementsFromList approaches the size of listToBeFiltered. Any insight/alternate solution is much appreciated.

解决方案

The problem is that the x.removeAll(y) operation is O(n×m), where n is the size of the collection x, and m is the size of the collection y (i.e., O(|x|×|y|)).

The removeAll method is basically just iterating over the entire list for each element in y, checking if each element in x happens to be equal, and removing it if so. It would be much more efficient if you could do that in one pass.

Assuming you're using Java 8, there's a much more efficient way to do this:

List<Integer> xs = new ArrayList<>();
// TODO: initialize xs with a bunch of values
List<Integer> ys = new ArrayList<>();
// TODO: initialize ys with a bunch of values
Set<Integer> ysSet = new HashSet<>(ys);
List<Integer> xsPrime = xs.stream()
    .filter(x -> !ysSet.contains(x))
    .collect(Collectors.toList());

For for xs of size 100k and ys of size 66k, using removeAll took about 5500ms, whereas using the above method only took about 8ms. I would expect the difference to be even more pronounced when you scale up to 200k due to the quadratic complexity of removeAll.

In contrast, the complexity of the filter version used above is going to be O(n+m), since it's O(m) to build the HashSet of all the values in ys, and then O(n) to iterate over all the values of xs to make sure none are contained in the new ysSet. (This is of course assuming that a HashSet lookup is O(1).)


Looking back at your question again, I realize you're already using filter... In that case, I suggest just inverting your filter logic, and then resetting the passed-in list's values to the filtered values:

public List<> filterList(List<> listToBeFiltered){
    List<> filteredList = listToBeFiltered.parallelStream()
        .filter(/* some inverted filtering logic */)
        .collect(Collectors.toList());
    listToBeFiltered.clear();
    listToBeFiltered.addAll(filteredList);
    return listToBeFiltered;
}

If you don't need to mutate the original list, then you can just return filteredList directly. (That would be my preferred solution anyway.)


I just ran my tests again, and this time I added another version that uses a loop instead of streams:

Set<Integer> ysSet = new HashSet<>(ys);
List<Integer> xsPrime = new ArrayList<>();
for (Integer x : xs) {
    if (!ysSet.contains(x)) {
        xsPrime.add(x);
    }
}
return xsPrime;

This version finished in about 7ms instead of 8ms. Since that's only marginally faster than the stream version (especially considering the original version using removeAll was 3 orders of magnitude slower), I'd stick with the stream version—especially because you can take advantage of parallelism there (as you're already doing with parallelStream).

这篇关于了解集合removeAll方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆