番石榴Sets.intersection不良表现 [英] Guava Sets.intersection bad performance

查看:73
本文介绍了番石榴Sets.intersection不良表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我今天在生产中遇到了一个奇怪的问题.尽管我喜欢Guava,但我遇到了一个用例,其中Guava的 Sets.intersection()表现很差.我已经编写了示例代码:

I've encountered a strange problem in production today. Though I love Guava, I ran into a use case where Guava's Sets.intersection() was performing pretty badly. I've written a sample code:

Set<Long> cache = new HashSet<>();
for (long i = 0; i < 1000000; i++) {
    cache.add(i);
}
Set<Long> keys = new HashSet<>();
for (long i = 0; i < 100; i++) {
    keys.add(i);
}
long start = System.currentTimeMillis();
Set<Long> foundKeys = new HashSet<>();
for (Long key : keys) {
    if (cache.contains(key)) {
        foundKeys.add(key);
    }
}
System.out.println("Java search: " + (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
SetView<Long> intersection = Sets.intersection(keys, cache);
System.out.println("Guava search: " + (System.currentTimeMillis() - start));

我试图创建一个类似的生产场景,其中有一个密钥缓存,并且我正在寻找缓存中存在的所有密钥.奇怪的是,番石榴搜索比Java搜索花费了很多时间.运行完之后,我得到了:

I've tried to create a similar production scenario where I've a keys cache, and I'm looking for all the keys present in the cache. Strangely, Guava search is taking a lot longer than Java search. After running this I got:

Java search: 0
Guava search: 36

谁能说出为什么这不适合我的用例或者番石榴中有bug?

Can anyone tell why this isn't suited for my use case or is there a bug in Guava?

推荐答案

事实证明,问题是多次调用 SetView.size().由于 SetView 是两个集合的交集的(实时)视图,因此每次都需要重新计算交集大小.

It turns out the problem was multiple calls to SetView.size(). As SetView is a (live) view of the intersection of the two sets, the intersection size needs to be recalculated every time.

public static <E> SetView<E> intersection( final Set<E> set1, final Set<?> set2) {
//...
  return new SetView<E>() {
    @Override public Iterator<E> iterator() {
      return Iterators.filter(set1.iterator(), inSet2);
    }
    @Override public int size() {
      return Iterators.size(iterator());
    }
    //...
  };
}

从这里可以看出,在这种情况下,重新计算的意思是遍历整个视图,这可能会非常耗时.

As can be seen here, what recalculation means in this case is iterating across the entire view, which can be quite time-consuming.

因此,解决此问题的方法是确保仅调用一次 size()并将其值存储(如果您知道基础集合不会更改),或者不可能,例如,通过 ImmutableSet.copyOf()创建交点的副本.

The way to get around this is therefore either to make sure size() is only called once and the value is stored (if you know the underlying sets won't change), or if that isn't possible, create a copy of the intersection via ImmutableSet.copyOf() (for example).

这篇关于番石榴Sets.intersection不良表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆