了解集合removeAll方法 [英] Insight into Collections removeAll method
问题描述
我有一个大小约200k的列表。我在筛选列表时遇到一些问题。
这是实现:
public List<> filterList(List<> listToBeFiltered){
List<> removeElementsFromList = listToBeFiltered.parallelStream()
.filter(//一些过滤逻辑)
.collect(Collectors.toList());
listToBeFiltered.removeAll(removeElementsFromList);
return listToBeFiltered;
}
我面对的问题是,程序将保持停留在removeAlementsFromList接近listToBeFiltered的大小时,removeAll语句。非常感谢任何洞察/替代解决方案。
问题是 x.removeAll )
操作是 O(n×m),其中 是集合的大小 x
和 m 是集合 y
的大小(即 O(| x | x | y | / em>)。
removeAll
方法基本上只是遍历整个列表中的每个元素 y
,检查 x
中的每个元素是否相等,如果是这样,则删除它。
假设你使用的是Java 8,有一个更有效的方法来做到这一点:
List< Integer> xs = new ArrayList<>();
// TODO:用一堆值初始化xs
List< Integer> ys = new ArrayList<>();
// TODO:用一组值初始化ys
Set< Integer> ysSet = new HashSet<>(ys);
List< Integer> xsPrime = xs.stream()
.filter(x - >!ysSet.contains(x))
.collect(Collectors.toList
对于大小为100k的 xs
使用 removeAll
花费大约5500毫秒的大小 66k
的code> ys 使用上面的方法只需要大约8ms。由于 removeAll
的二次复杂性,当您扩展到200k时,我预计差异会更显着。
<相反,上面使用的过滤器版本的复杂性将是 O(n + m),因为它是 O(m)$ c> ys 中的所有值,然后 O(n)来遍历所有值的$ c> HashSet
xs
以确保新 ysSet
中不包含任何内容。 (这当然假设 HashSet
查找是 O(1)。) < hr>
再次回顾你的问题,我意识到你已经在使用 filter
...在这种情况下,我建议只是反转过滤器逻辑,然后将传入的列表的值重置为过滤的值:
public List< filterList(List<> listToBeFiltered){
List<> filteredList = listToBeFiltered.parallelStream()
.filter(/ *一些反向过滤逻辑* /)
.collect(Collectors.toList());
listToBeFiltered.clear();
listToBeFiltered.addAll(filteredList);
return listToBeFiltered;
}
如果您不需要改变原始列表, return filteredList
。 (这将是我的首选解决方案。)
我只是运行我的测试,使用循环而不是流:
Set< Integer> ysSet = new HashSet<>(ys);
List< Integer> xsPrime = new ArrayList<>();
for(Integer x:xs){
if(!ysSet.contains(x)){
xsPrime.add(x);
}
}
return xsPrime;
此版本在大约7ms而不是8ms完成。因为这只是速度比流版本(特别是考虑到原始版本使用 removeAll
慢了3个数量级),我会坚持流版本 - 特别是因为你可以利用并行性(正如你已经在使用 parallelStream
)。
I have a list of size ~200k..I am facing some issues while filtering the list.
Here is the implementation:
public List<> filterList(List<> listToBeFiltered){
List<> removeElementsFromList = listToBeFiltered.parallelStream()
.filter(//some filtering logic)
.collect(Collectors.toList());
listToBeFiltered.removeAll(removeElementsFromList);
return listToBeFiltered;
}
The problem I face with the code is that the program will remain stuck at the removeAll statement when the removeElementsFromList approaches the size of listToBeFiltered. Any insight/alternate solution is much appreciated.
The problem is that the x.removeAll(y)
operation is O(n×m), where n is the size of the collection x
, and m is the size of the collection y
(i.e., O(|x|×|y|)).
The removeAll
method is basically just iterating over the entire list for each element in y
, checking if each element in x
happens to be equal, and removing it if so. It would be much more efficient if you could do that in one pass.
Assuming you're using Java 8, there's a much more efficient way to do this:
List<Integer> xs = new ArrayList<>();
// TODO: initialize xs with a bunch of values
List<Integer> ys = new ArrayList<>();
// TODO: initialize ys with a bunch of values
Set<Integer> ysSet = new HashSet<>(ys);
List<Integer> xsPrime = xs.stream()
.filter(x -> !ysSet.contains(x))
.collect(Collectors.toList());
For for xs
of size 100k and ys
of size 66k
, using removeAll
took about 5500ms, whereas using the above method only took about 8ms. I would expect the difference to be even more pronounced when you scale up to 200k due to the quadratic complexity of removeAll
.
In contrast, the complexity of the filter version used above is going to be O(n+m), since it's O(m) to build the HashSet
of all the values in ys
, and then O(n) to iterate over all the values of xs
to make sure none are contained in the new ysSet
. (This is of course assuming that a HashSet
lookup is O(1).)
Looking back at your question again, I realize you're already using filter
... In that case, I suggest just inverting your filter logic, and then resetting the passed-in list's values to the filtered values:
public List<> filterList(List<> listToBeFiltered){
List<> filteredList = listToBeFiltered.parallelStream()
.filter(/* some inverted filtering logic */)
.collect(Collectors.toList());
listToBeFiltered.clear();
listToBeFiltered.addAll(filteredList);
return listToBeFiltered;
}
If you don't need to mutate the original list, then you can just return filteredList
directly. (That would be my preferred solution anyway.)
I just ran my tests again, and this time I added another version that uses a loop instead of streams:
Set<Integer> ysSet = new HashSet<>(ys);
List<Integer> xsPrime = new ArrayList<>();
for (Integer x : xs) {
if (!ysSet.contains(x)) {
xsPrime.add(x);
}
}
return xsPrime;
This version finished in about 7ms instead of 8ms. Since that's only marginally faster than the stream version (especially considering the original version using removeAll
was 3 orders of magnitude slower), I'd stick with the stream version—especially because you can take advantage of parallelism there (as you're already doing with parallelStream
).
这篇关于了解集合removeAll方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!