桶的索引数 [英] Indexing count of buckets

查看:104
本文介绍了桶的索引数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,这是我的一点小问题。

So, here is my little problem.

让我们说我有水桶列表的<子> 0 ...一个<子> N 分别含有L&LT; = C 0 .. 。ç<子> N &LT; H项。我可以决定的L和H限制。我甚至可以动态地更新它们,但我不认为这将有助于太多。

Let's say I have a list of buckets a0 ... an which respectively contain L <= c0 ... cn < H items. I can decide of the L and H limits. I could even update them dynamically, though I don't think it would help much.

桶的顺序很重要。我不能去交换他们身边。

The order of the buckets matter. I can't go and swap them around.

现在,我想这些指标桶,因此:

Now, I'd like to index these buckets so that:

  • 在我知道的项目总数
  • 在我查找了第i个元素
  • 我可以添加/任何桶删除项目,有效地更新索引

似乎很容易吧?看到这些条件我马上想到了一个树状数组。这就是他们都是为了真的。

Seems easy right ? Seeing these criteria I immediately thought about a Fenwick Tree. That's what they are meant for really.

然而,当你想到的用例,其他几个用例蠕变:

However, when you think about the use cases, a few other use cases creep in:

  • 如果一个桶计数低于L,桶必须消失(不要担心项目还)
  • 如果一个桶计数到达H,那么新的水桶必须创建,因为这一个充满

我还没有想出如何有效地编辑树状数组:删除/添加一个节点,而重建整个树...

I haven't figured out how to edit a Fenwick Tree efficiently: remove / add a node without rebuilding the whole tree...

当然,我们可以设置L = 0,因此,取消会成为不必要的,但是加入的项目不能真正回避。

Of course we could setup L = 0, so that removing would become unecessary, however adding items cannot really be avoided.

因此​​,这里的问题:

So here is the question:

你知道无论是该指数更加合理的结构,或如何更新分域树?

Do you know either a better structure for this index or how to update a Fenwick Tree ?

主要关注的是效率,因为我不打算实现它的缓存/内存的考虑是值得担忧的。

The primary concern is efficiency, and because I do plan to implement it cache/memory considerations are worth worrying about.

背景的:

我想拿出一个结构有点类似于B树而排名跳转列表,但有局部索引。这两个结构的问题是,索引被保持沿数据,这是低效率的高速缓存项(例如,你需要获取从存储器多页)。数据库实现建议保持从实际数据中分离索引更多的缓存喜欢的,从而更有效地

I am trying to come up with a structure somewhat similar to B-Trees and Ranked Skip Lists but with a localized index. The problem of those two structures is that the index is kept along the data, which is inefficient in term of cache (ie you need to fetch multiple pages from memory). Database implementations suggest that keeping the index isolated from the actual data is more cache-friendly, and thus more efficient.

推荐答案

我已经明白你的问题如下:

I have understood your problem as:

每个桶中有一个内部秩序和水桶本身具有的订单,所以所有的元素有一定的顺序,你需要在订购的第i个元素。

Each bucket has an internal order and buckets themselves have an order, so all the elements have some ordering and you need the ith element in that ordering.

要解决的是:

你可以做的是保持一个累积值树的叶子节点(X1,X2,...,XN)是桶的大小。一个节点的值是它的直接子的值的总和。保持2呐力量将使它简单(你总是可以在最后零大小桶垫吧)和树将是一棵完整的树。

What you can do is maintain a 'cumulative value' tree where the leaf nodes (x1, x2, ..., xn) are the bucket sizes. The value of a node is the sum of values of its immediate children. Keeping n a power of 2 will make it simple (you can always pad it with zero size buckets in the end) and the tree will be a complete tree.

对应每一个桶你将保持一个指向对应的叶节点。

Corresponding to each bucket you will maintain a pointer to the corresponding leaf node.

例如,假设桶的大小是2,1,4,8。

Eg, say the bucket sizes are 2,1,4,8.

树看起来像

     15
    /  \
   3    12
  / \  / \
 2  1  4  8

如果你想要的总数,读取根节点的值。

If you want the total count, read the value of the root node.

如果你想修改一些XK(即变化相对应的桶的大小),你可以走了树下面父指针,更新的价值。

If you want to modify some xk (i.e. change correspond bucket size), you can walk up the tree following parent pointers, updating the values.

例如,如果您添加4项第二桶将是(标有*的节点是改变了的)

For instance if you add 4 items to the second bucket it will be (the nodes marked with * are the ones that changed)

     19*
    /   \
   7*    12
  / \   / \
 2  5*  4  8

如果你想找到的第i个元素,你走在上面的树,有效地做二进制搜索。你已经有一个左孩子和右孩子计数。如果当前节点i>左子节点值,减去你在正确的树中的左子节点值和递归。如果我&LT; =左子节点值,你去左边,并再次递归

If you want to find the ith element, you walk down the above tree, effectively doing the binary search. You already have a left child and right child count. If i > left child node value of current node, you subtract the left child node value and recurse in the right tree. If i <= left child node value, you go left and recurse again.

假设你想找到在上面的树中的第9个元素:

Say you wanted to find the 9th element in the above tree:

由于根的左孩子是7 LT; 9。 您可以从9(获得2)减去7和向右走。

Since left child of root is 7 < 9. You subtract 7 from 9 (to get 2) and go right.

由于2版; 4(12的左孩子),你去左边。

Since 2 < 4 (the left child of 12), you go left.

您是在对应于第三桶的叶节点。现在,您需要选择在桶中的第二个元素。

You are at the leaf node corresponding to the third bucket. You now need to pick the second element in that bucket.

如果你需要添加一个新的水桶,你双倍的树的大小(如果需要),通过添加新的根,使现有的树左子,并与所有零桶添加一个新的树,除非你添加的(我们是新树的最左边的叶)。这将摊销O(1)时间,添加新的价值树。需要注意的是,你可以在结尾,中间只加一斗,而不是任何地方。

If you have to add a new bucket, you double the size of your tree (if needed) by adding a new root, making the existing tree the left child and add a new tree with all zero buckets except the one you added (which we be the leftmost leaf of the new tree). This will be amortized O(1) time for adding a new value to the tree. Caveat is you can only add a bucket at the end, and not anywhere in the middle.

获得总计数为O(1)。 更新项目的单桶/查找是O(LOGN)。

Getting the total count is O(1). Updating single bucket/lookup of item are O(logn).

添加新桶摊销O(1)。

Adding new bucket is amortized O(1).

使用空间为O(n)。

而不是一个二叉树,你也许可以用B树做同样的。

Instead of a binary tree, you can probably do the same with a B-Tree.

这篇关于桶的索引数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆