Java Collections Framework中常见方法(大小)的意外复杂性? [英] Unexpected complexity of common methods (size) in Java Collections Framework?

查看:150
本文介绍了Java Collections Framework中常见方法(大小)的意外复杂性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我感到惊讶的是,一些Java集合没有定时运行方法size()。



虽然我了解到并发集合的实现做出了一些妥协,作为并发增益的权衡(ConcurrentLinkedQueue,ConcurrentSkipListSet,LinkedTransferQueue等中的大小为O(n))的好消息是,这在API文档中已被正确记录。



关心我的是对某些集合方法返回的视图的方法大小的性能。例如, TreeSet.tailSet 返回其元素大于或等于fromElement的后台集合的部分的视图。令我惊讶的是,返回SortedSet的调用大小是时间线性的,也就是O(n)。至少这是我从OpenJDK的源代码中挖掘出来的:
在TreeSet中实现为TreeMap中的包装,在TreeMap中,有一个EntrySetView类,其大小方法如下:

 抽象类EntrySetView扩展AbstractSet< Map.Entry< K,V>> {
private transient int size = -1,sizeModCount;

public int size(){
if(fromStart&& toEnd)
return m.size();
if(size == -1 || sizeModCount!= m.modCount){
sizeModCount = m.modCount;
size = 0;
Iterator i = iterator();
while(i.hasNext()){
size ++;
i.next();
}
}
返回大小;
}

....
}

这意味着第一次调用大小是O(n),只要不修改后备映射,它就被缓存。我在API文档中找不到这个事实。在子树大小缓存中,更有效的实现将是O(log n)与存储器权衡。由于这样做是为了避免代码复制(TreeSet作为TreeMap上的包装)而进行的这种折衷,我看不出为什么不应该出于性能原因而被理由。



无论我对(TreeSet)OpenJDK执行情况(非常简短)进行了简单的分析,我想知道是否有关于许多这样的操作的详细和完整的文档,特别是完全意外的操作?

解决方案


例如, TreeSet.tailSet 返回元素大于或等于 fromElement 的后台集的部分。令我惊讶的是,在大小上调用 SortedSet 在时间上是线性的,即 O(n)


对我来说并不奇怪。考虑javadoc中的这个句子:


返回的集合由此集合支持,因此返回集合中的更改将反映在此集合中,而反之亦然。


由于尾​​集是后台集的动态视图,因此其大小必须在实践中动态计算。替代方案将要求当对后备套件进行更改时,必须调整所有现有尾座(和耳机)视图的尺寸。这将使得后台设备的更新更加昂贵,并且会提出存储管理问题。 (为了更新视图大小,后备集将需要引用所有现有的视图集...并且这是潜在的内存泄漏。)



现在你有关文件的一点。但事实上,javadocs并不表示视图集合的复杂性。而且,事实上,它甚至没有记录 TreeSet.size() O(1)!事实上,它只记录了 add 删除包含操作。







我想知道是否有详细和完整的许多这样的操作的文档,特别是完全意外的操作的文档?


AFAIK,不,当然不是来自Sun / Oracle .. 。


Recently, I've been surprised by the fact that some Java collections don't have constant time operation of method size().

While I learned that concurrent implementations of collections made some compromises as a tradeoff for gain in concurrency (size being O(n) in ConcurrentLinkedQueue, ConcurrentSkipListSet, LinkedTransferQueue, etc.) good news is that this is properly documented in API documentation.

What concerned me is the performance of method size on views returned by some collections' methods. For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n). At least that is what I managed to dig up from the source code of OpenJDK: In TreeSet is implemented as wrapper over TreeMap, and within a TreeMap, there is EntrySetView class whose size method is as follows:

abstract class EntrySetView extends AbstractSet<Map.Entry<K,V>> {
    private transient int size = -1, sizeModCount;

    public int size() {
        if (fromStart && toEnd)
            return m.size();
        if (size == -1 || sizeModCount != m.modCount) {
            sizeModCount = m.modCount;
            size = 0;
            Iterator i = iterator();
            while (i.hasNext()) {
                size++;
                i.next();
            }
        }
        return size;
    }

    ....
}

This means that first time size is called is O(n) and then it's cached as long as backing map is not modified. I was not able to find this fact in the API documentation. More efficient implementation would be O(log n) with memory tradeoff in caching of subtree sizes. Since such tradeoffs are being made for avoiding code duplication (TreeSet as wrapper over TreeMap), I don't see a reason why they should not be made for performance reasons.

Disregarding me being right or wrong with my (very brief) analysis of the OpenJDK implementation of TreeSet, I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

解决方案

For example, TreeSet.tailSet returns a view of the portion of backing set whose elements are greater than or equal to fromElement. What surprised me a lot is that calling size on returned SortedSet is linear in time, that is O(n).

To me it is not surprising. Consider this sentence from the javadoc:

"The returned set is backed by this set, so changes in the returned set are reflected in this set, and vice-versa."

Since the tail set is a dynamic view of the backing set, it follows that its size has to be calculated dynamically in practice. The alternative would require that when a change was made to the backing set, it would have to adjust the sizes of all extant tailset (and headset) views. That would make updates to the backing set more expensive, AND it would present a storage management problem. (In order to update the view sizes, the backing set would need references to all existing view sets ... and that is a potential hidden memory leak.)

Now you do have a point regarding the documentation. But in fact, the javadocs says nothing about the complexity of the view collections. And, indeed, it doesn't even document that TreeSet.size() is O(1)! In fact, it only documents the complexity of the add, remove and contains operations.


I would like to know is there a detailed and complete documentation on performance of many such operations especially ones which are completely unexpected?

AFAIK, No. Certainly, not from Sun / Oracle ...

这篇关于Java Collections Framework中常见方法(大小)的意外复杂性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆