Java Collections Framework中常见方法(大小)的意外复杂性? [英] Unexpected complexity of common methods (size) in Java Collections Framework?

查看:191
本文介绍了Java Collections Framework中常见方法(大小)的意外复杂性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我感到惊讶的是,一些Java集合没有方法size()的常量时间操作。

Recently, I've been surprised by the fact that some Java collections don't have constant time operation of method size().

虽然我学到了并发好的消息是,这在API文档中已经被正确地记录了。

While I learned that concurrent implementations of collections made some compromises as a tradeoff for gain in concurrency (size being O(n) in ConcurrentLinkedQueue, ConcurrentSkipListSet, LinkedTransferQueue, etc.) good news is that this is properly documented in API documentation.

我关心的是方法大小对一些集合方法返回的视图的性能。例如, TreeSet.tailSet 会返回其中元素大于或等于fromElement的后备集的部分的视图。让我感到惊讶的是,在返回的SortedSet的调用大小在时间上是线性的,也就是O(n)。至少这是我设法从OpenJDK的源代码挖掘:
在TreeSet中实现为TreeMap的包装,在TreeMap中,有EntrySetView类,其大小方法如下:

What concerned me is the performance of method size on views returned by some collections' methods. For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n). At least that is what I managed to dig up from the source code of OpenJDK: In TreeSet is implemented as wrapper over TreeMap, and within a TreeMap, there is EntrySetView class whose size method is as follows:

abstract class EntrySetView extends AbstractSet<Map.Entry<K,V>> {
    private transient int size = -1, sizeModCount;

    public int size() {
        if (fromStart && toEnd)
            return m.size();
        if (size == -1 || sizeModCount != m.modCount) {
            sizeModCount = m.modCount;
            size = 0;
            Iterator i = iterator();
            while (i.hasNext()) {
                size++;
                i.next();
            }
        }
        return size;
    }

    ....
}

这意味着第一次调用的时间大小为O(n),然后只要不修改后备映射,它就被缓存。我不能在API文档中找到这个事实。更高效的实现是在缓存子树大小时存储器权衡的O(log n)。因为这样的折衷是为了避免代码重复(TreeSet作为TreeMap的封装),我没有看到为什么不应该出于性能原因的原因。

This means that first time size is called is O(n) and then it's cached as long as backing map is not modified. I was not able to find this fact in the API documentation. More efficient implementation would be O(log n) with memory tradeoff in caching of subtree sizes. Since such tradeoffs are being made for avoiding code duplication (TreeSet as wrapper over TreeMap), I don't see a reason why they should not be made for performance reasons.

忽略我对我的(非常简短)分析的OpenJDK实现的TreeSet,我想知道是否有一个详细和完整的文档性能的许多这样的操作,特别是那些是完全意想不到的?

Disregarding me being right or wrong with my (very brief) analysis of the OpenJDK implementation of TreeSet, I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

推荐答案


例如, TreeSet.tailSet 其元素大于或等于 fromElement 的后备集的部分。令我惊讶的是,在返回的 SortedSet 上调用 size 在时间上是线性的,即 O(n)

For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n).

对我来说,这并不奇怪。考虑javadoc中的这句话:

To me it is not surprising. Consider this sentence from the javadoc:


返回的集合由这个集合支持,所以返回的集合的变化反映在这个集合,反之亦然。

"The returned set is backed by this set, so changes in the returned set are reflected in this set, and vice-versa."

由于尾​​部集合是后备集合的动态视图,因此它的大小必须在实践中动态计算。替代方案将要求当对后备集进行改变时,它将必须调整所有现存尾缀(和耳机)视图的大小。这将使对支持集的更新更昂贵,并且它将提出存储管理问题。 (为了更新视图大小,后台集合将需要引用所有现有的视图集...这是潜在的隐藏内存泄漏。)

Since the tail set is a dynamic view of the backing set, it follows that its size has to be calculated dynamically in practice. The alternative would require that when a change was made to the backing set, it would have to adjust the sizes of all extant tailset (and headset) views. That would make updates to the backing set more expensive, AND it would present a storage management problem. (In order to update the view sizes, the backing set would need references to all existing view sets ... and that is a potential hidden memory leak.)

现在你有关于文档的一点。但事实上,javadocs没有说明视图集合的复杂性。确实,它甚至没有记录 TreeSet.size() O(1)!事实上,它只记录了 add remove 的复杂性。操作。

Now you do have a point regarding the documentation. But in fact, the javadocs says nothing about the complexity of the view collections. And, indeed, it doesn't even document that TreeSet.size() is O(1)! In fact, it only documents the complexity of the add, remove and contains operations.


我想知道是否有详细和完整许多此类操作的性能文档,特别是完全出乎意料的操作?

I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

AFAIK, 。

这篇关于Java Collections Framework中常见方法(大小)的意外复杂性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆