HashSet< T> .removeAll方法出奇地慢 [英] The HashSet<T>.removeAll method is surprisingly slow

查看:116
本文介绍了HashSet< T> .removeAll方法出奇地慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Jon Skeet最近在他的博客上提出了一个有趣的编程主题:我的抽象中有一个洞,亲爱的丽莎,亲爱的丽莎" (强调):

Jon Skeet recently raised an interesting programming topic on his blog: "There's a hole in my abstraction, dear Liza, dear Liza" (emphasis added):

实际上我有一套-HashSet.我想从中删除一些项目……许多项目可能不存在.实际上,在我们的测试案例中,移除"项中的项集合将保留原始设置.这听起来(实际上确实是 )非常容易编写.毕竟,我们已经

I have a set – a HashSet, in fact. I want to remove some items from it… and many of the items may well not exist. In fact, in our test case, none of the items in the "removals" collection will be in the original set. This sounds – and indeed is – extremely easy to code. After all, we’ve got Set<T>.removeAll to help us, right?

我们指定来源"的大小设置和清除"的大小并在命令行上构建它们.源集仅包含非负整数;清除集仅包含负整数.如您所见,我们使用System.currentTimeMillis()来衡量删除所有元素所需的时间,这并不是世界上最精确的秒表,但在这种情况下已经足够了.这是代码:

We specify the size of the "source" set and the size of the "removals" collection on the command line, and build both of them. The source set contains only non-negative integers; the removals set contains only negative integers. We measure how long it takes to remove all the elements using System.currentTimeMillis(), which isn’t the world most accurate stopwatch but is more than adequate in this case, as you’ll see. Here’s the code:

import java.util.*;
public class Test 
{ 
    public static void main(String[] args) 
    { 
       int sourceSize = Integer.parseInt(args[0]); 
       int removalsSize = Integer.parseInt(args[1]); 
        
       Set<Integer> source = new HashSet<Integer>(); 
       Collection<Integer> removals = new ArrayList<Integer>(); 
        
       for (int i = 0; i < sourceSize; i++) 
       { 
           source.add(i); 
       } 
       for (int i = 1; i <= removalsSize; i++) 
       { 
           removals.add(-i); 
       } 
        
       long start = System.currentTimeMillis(); 
       source.removeAll(removals); 
       long end = System.currentTimeMillis(); 
       System.out.println("Time taken: " + (end - start) + "ms"); 
    }
}

让我们开始做一个简单的工作:一个包含100个项目的源集,并删除100个项目:

Let’s start off by giving it an easy job: a source set of 100 items, and 100 to remove:

c:UsersJonTest>java Test 100 100
Time taken: 1ms

好吧,所以我们没想到它会变慢……很明显,我们可以提高一些速度.一百万个项目和30万个要删除的项目的来源怎么样?

Okay, so we hadn’t expected it to be slow… clearly we can ramp things up a bit. How about a source of one million items and 300,000 items to remove?

c:UsersJonTest>java Test 1000000 300000
Time taken: 38ms

嗯.看起来还是很迅速的.现在,我觉得自己有点残酷,要求它执行所有删除操作.让我们轻松一点-300,000个源项目和300,000个删除项目:

Hmm. That still seems pretty speedy. Now I feel I’ve been a little bit cruel, asking it to do all that removing. Let’s make it a bit easier – 300,000 source items and 300,000 removals:

c:UsersJonTest>java Test 300000 300000
Time taken: 178131ms

打扰一下?将近三分钟吗? kes!当然,从较小的集合中删除项目比我们在38ms内管理的项目要容易得多?

Excuse me? Nearly three minutes? Yikes! Surely it ought to be easier to remove items from a smaller collection than the one we managed in 38ms?

有人可以解释为什么会这样吗?为什么HashSet<T>.removeAll方法这么慢?

Can someone explain why this is happening? Why is the HashSet<T>.removeAll method so slow?

推荐答案

javadoc :

此实现通过在每个集合上调用size方法来确定该集合和指定集合中的较小者. 如果该集合具有较少的元素 ,则实现将对此集合进行迭代,依次检查迭代器返回的每个元素,以查看 是否包含在指定的集合中 .如果包含此类内容,则使用迭代器的remove方法将其从此集合中删除.如果指定的集合具有较少的元素,则实现将迭代指定的集合,并使用此集合的remove方法从该集合中删除迭代器返回的每个元素.

This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

实际上,当您致电source.removeAll(removals);时:

  • 如果removals集合的大小小于source,则将调用HashSetremove方法,这是快速的.

  • if the removals collection is of a smaller size than source, the remove method of HashSet is called, which is fast.

如果removals集合的大小等于或大于source的大小,则将调用removals.contains,这对于ArrayList来说很慢.

if the removals collection is of equal or larger size than the source, then removals.contains is called, which is slow for an ArrayList.

快速修复:

Collection<Integer> removals = new HashSet<Integer>();

请注意,有一个一个公开的错误与您所描述的非常相似.底线似乎是它可能是一个糟糕的选择,但由于它已在javadoc中进行了记录,因此无法更改.

Note that there is an open bug that is very similar to what you describe. The bottom line seems to be that it is probably a poor choice but can't be changed because it is documented in the javadoc.

作为参考,这是removeAll的代码(在Java 8中-尚未检查其他版本):

For reference, this is the code of removeAll (in Java 8 - haven't checked other versions):

public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}

这篇关于HashSet&lt; T&gt; .removeAll方法出奇地慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆