Scala的Vector如何工作? [英] How does Scala's Vector work?

查看:136
本文介绍了Scala的Vector如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了本页关于Scala集合的时间复杂性的信息。就像说的那样, Vector 的复杂度对于所有操作来说都是 eC

I read this page about the time complexity of Scala collections. As it says, Vector's complexity is eC for all operations.

这让我想知道 Vector 是什么。我阅读了文档,它说:

It made me wonder what Vector is. I read the document and it says:


由于向量在快速随机选择和快速随机功能更新之间取得了良好的平衡,因此它们目前是不可变索引序列的
默认实现。它由具有分支因子32的
a小字节序位映射向量trie支持。
局部性很好,但不连续,这对于
大序列非常有用。

Because vectors strike a good balance between fast random selections and fast random functional updates, they are currently the default implementation of immutable indexed sequences. It is backed by a little endian bit-mapped vector trie with a branching factor of 32. Locality is very good, but not contiguous, which is good for very large sequences.

与Scala的其他所有内容一样,它非常模糊。 Vector 实际如何工作?

As with everything else about Scala, it's pretty vague. How actually does Vector work?

推荐答案

此处的关键字是 Trie
Vector被实现为 Trie 数据结构。
请参见 http://en.wikipedia.org/wiki/Trie

The keyword here is Trie. Vector is implemented as a Trie datastructure. See http://en.wikipedia.org/wiki/Trie.

更准确地说,它是位图向量trie。我刚刚在这里找到了足够简洁的结构描述(以及实现-显然在Rust中):

More precisely, it is a "bit-mapped vector trie". I've just found a consise enough description of the structure (along with an implementation - apparently in Rust) here:

https://bitbucket.org/astrieanna/bitmapped-vector-trie

最相关的摘录是:


位图矢量Trie基本上是一棵32棵树。级别1是大小为32的数组,无论数据类型如何。级别2由32个级别1组成。依此类推,直到:Level 7是2个Level 6的数组。

A Bitmapped Vector Trie is basically a 32-tree. Level 1 is an array of size 32, of whatever data type. Level 2 is an array of 32 Level 1's. and so on, until: Level 7 is an array of 2 Level 6's.

更新:赖玉轩对复杂性的评论:

UPDATE: In reply to Lai Yu-Hsuan's comment about complexity:

在这里,我必须假设您的意思是深度 :-D。图例中的 eC表示该操作实际上需要恒定的时间,但这可能取决于某些假设,例如向量的最大长度或哈希键的分布。

I will have to assume you meant "depth" here :-D. The legend for "eC" says "The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.".

如果您愿意考虑最坏的情况,并且考虑到向量的最大大小有上限,那么可以肯定地说,复杂度是恒定的。
假设我们认为最大大小为2 ^ 32,那么这意味着在任何情况下,最坏的情况是最多7个操作。
然后,我们总是可以考虑任何类型的集合的最坏情况,找到一个上限并说这是恒定的复杂性,但是对于一个列表,这意味着一个40亿的常数,这不是

If you are willing to consider the worst case, and given that there is an upper bound to the maximum size of the vector, then yes indeed we can say that the complexity is constant. Say that we consider the maximum size to be 2^32, then this means that the worst case is 7 operations at most, in any case. Then again, we can always consider the worst case for any type of collection, find an upper bound and say this is constant complexity, but for a list by example, this would mean a constant of 4 billions, which is not quite practical.

但是Vector是相反的,7个操作比实际操作要多,这就是我们能够负担得起的在实践中考虑其复杂性常数的方法 em>。

But Vector is the opposite, 7 operations being more than practical, and this is how we can afford to consider its complexity constant in practice.

另一种查看方式:我们不是在谈论log(2,N),而是在谈论log(32,N)。如果您尝试绘制,将会发现它实际上是一条水平线。因此,务实地说,随着收藏的增长,您将永远看不到处理时间的大幅增加。
是的,它仍然不是真正恒定的(这就是为什么它被标记为 eC而不仅仅是 C的原因),并且您将能够看到短向量周围的差异(但同样,非常小差异,因为操作数量增长非常缓慢)。

Another way to look at this: we are not talking about log(2,N), but log(32,N). If you try to plot that you'll see it is practically an horizontal line. So pragmatically speaking you'll never be able to see much increase in processing time as the collection grows. Yes, that's still not really constant (which is why it is marked as "eC" and not just "C"), and you'll be able to see a difference around short vectors (but again, a very small difference because the number of operations grows so much slowly).

这篇关于Scala的Vector如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆