HashSet removeAll方法非常慢 [英] HashSet removeAll method is surprisingly slow

查看:245
本文介绍了HashSet removeAll方法非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个集合 - 一个HashSet我想从中删除一些项目......removals集合中的所有项目都不在原始集合中。

I have a set – a HashSet I want to remove some items from it… none of the items in the "removals" collection will be in the original set.

我在命令行中指定source集的大小和removals集合的大小,并构建它们。源集仅包含非负整数;删除集仅包含负整数。
我测量使用System.currentTimeMillis()移除所有元素所需的时间,这不是世界上最精确的秒表,但在这种情况下绰绰有余,正如您将看到的那样。这是代码:

I specify the size of the "source" set and the size of the "removals" collection on the command line, and build both of them. The source set contains only non-negative integers; the removals set contains only negative integers. I measure how long it takes to remove all the elements using System.currentTimeMillis(), which isn’t the world most accurate stopwatch but is more than adequate in this case, as you’ll see. Here’s the code:

import java.util.*;
public class Test 
{ 
 public static void main(String[] args) 
 { 
    int sourceSize = Integer.parseInt(args[0]); 
    int removalsSize = Integer.parseInt(args[1]); 

    Set<Integer> source = new HashSet<Integer>(); 
    Collection<Integer> removals = new ArrayList<Integer>(); 

    for (int i = 0; i < sourceSize; i++) 
    { 
        source.add(i); 
    } 
    for (int i = 1; i <= removalsSize; i++) 
    { 
        removals.add(-i); 
    } 

    long start = System.currentTimeMillis(); 
    source.removeAll(removals); 
    long end = System.currentTimeMillis(); 
    System.out.println("Time taken: " + (end – start) + "ms"); 
} 
 }

让我们从轻松的工作开始:包含100个项目的源代码,以及100个要删除的内容:

Let’s start off by giving it an easy job: a source set of 100 items, and 100 to remove:

 c:UsersJonTest>java Test 100 100
 Time taken: 1ms

好的,这比我预期的要快。

Okay, That's fast as I expected.

接下来我尝试删除一百万件商品和300,000件商品?

Next i tried source of one million items and 300,000 items to remove?

c:UsersJonTest>java Test 1000000 300000
Time taken: 38ms

仍然看起来很快。
现在让它变得更容易 - 300,000个源项目和300,000个删除:

That still seems pretty speedy. Now make it a bit easier – 300,000 source items and 300,000 removals:

c:UsersJonTest>java Test 300000 300000
Time taken: 178131ms

将近三分钟?

真的很困惑!!有人可以解释为什么会发生这种情况。

Really confused !! can some one explain why this is happening.

推荐答案

javadoc


此实现通过在每个集合上调用size方法来确定该集合和指定集合中较小的集合。 如果此集合具有较少的元素 ,则实现将迭代此集合,依次检查迭代器返回的每个元素,以查看 是否包含在指定的集合中 。如果包含它,则使用迭代器的remove方法从该集合中删除它。如果指定的集合具有较少的元素,则实现迭代指定的集合,使用此set的remove方法从迭代器返回的每个元素中删除此集合。

This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

这在实践中意味着什么,当你致电 source.removeAll(去除);

What this means in practice, when you call source.removeAll(removals);:


  • 如果删除集合的大小小于 source ,调用删除 HashSet 的方法,这很快。

  • if the removals collection is of a smaller size than source, the remove method of HashSet is called, which is fast.

如果删除集合的大小等于或大于,则<$ c调用$ c> removals.contains ,这对于ArrayList来说很慢。

if the removals collection is of equal or larger size than the source, then removals.contains is called, which is slow for an ArrayList.

快速修复:

Collection<Integer> removals = new HashSet<Integer>();

请注意一个开放的bug ,与你描述的非常相似。底线似乎是它可能是一个糟糕的选择,但不能更改,因为它记录在javadoc中。

Note that there is an open bug that is very similar to what you describe. The bottom line seems to be that it is probably a poor choice but can't be changed because it is documented in the javadoc.

作为参考,这是 removeAll 的代码(在Java 8中 - 尚未检查其他版本):

For reference, this is the code of removeAll (in Java 8 - haven't checked other versions):

public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}

这篇关于HashSet removeAll方法非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆