在数据库中存储有序列表(Gap方法) [英] Store ordered list in database (Gap approach)

查看:388
本文介绍了在数据库中存储有序列表(Gap方法)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Google App Engine资料储存库中保留大量的有序清单(数百万个元素)。需要快速插入。



最简单的方法是添加一个索引属性(或列)order_num来表示顺序。例如,列表[A,B,C]将被存储如下:

  content order_num 
- ------------------
A 1
B 2
C 3

但是,这不会给你快速插入。例如,如果我想在A之后插入X,我必须将B和C重新编号为X的腾出空间,即让B变为3,C变为4,X为2.如果我有数百万个元素。



我发现一个称为gap方法的可行解决方案描述在这里。这种方法保持相邻元件之间的间隙。像这样:

  content order_num 
----------------- ---
A 1000
B 2000
C 3000

当我想在A之后插入X时,我可以简单地添加X,其order_num设置为(1000 + 2000)/ 2 = 1500,不需要重新编号。



这些间隙变得更小,可能需要重新编号。我的问题是,有没有任何已知的重编号策略?



UPDATE



这里有更多的细节。假设我有一个数据库中的元素列表,每个元素都有一个名为my_num的整数属性。 my_num的值是任意正整数。假设我有一个列表[A,B,C,D],它们的my_num是

 元素my_num 
- --------------------
A 5
B 2
C 10
D 7
<现在,让我们定义一个accum()运算符:

 

> accum(n)= element [0] .my_num + element [1] .my_num + ... + element [n-1] .my_num
/ pre>

因此,每个元素的累加值为

 元素my_num accum 
----------------------------
A 5 5
B 2 7
C 10 17
D 7 24

但累加值可能不应存储数据库,因为列表不断更新。



我想设计一个查询,其输入是一个整数x:

  query(x)= element [i] if accum(i-1)例如,查询(11)是C和查询(3),其中查询(11)是C )是A。



是否可以设计一个数据存储模式以使此查询更快?或者,唯一的方法是在查询时逐个累加它,我计划做。

解决方案

使用小数或字符串?

 内容顺序
------------- -------
A'a'
B'b'
C'c'

然后在a和b之间插入D,给出值'aa'



用于生成字符串最好显示为二进制字符串:如果要在1011和1100之间插入一些内容,请执行以下操作:




  • Avalue = 1 + 0 *(1/2)+ 1 *(1/4)+ 1 *(1/8)

  • Bvalue = 1 + 1 *(1 / 2)+ 0 *(1/4)+ 0 *(1/8)



平均值,新值= 1 + (1/2)+ 1 *(1/4)+ 1 *(1/8)+ 1 *(1/16)
new string =10111


$ b b

 内容订购
--------------------
A'1011'
新! '10111'
B'1100'
C'1101'

总是平均2个值,平均值将总是有一个有限的二进制开发,和一个有限的字符串。它有效地定义了一个二叉树。



因为你知道二叉树并不总是很好地平衡,换句话说,一些字符串将比其他字符长得多插入。为了保持它们简短,你可以使用任何偶数基数 - 它必须是因为那时任何平均值的发展是有限的。



但是无论你做什么,字符串可能会变长,你必须在某些时候做一些内务处理,清理值,以便有效地使用字符串空间。这个算法给你的是在清理之间的确定性,系统将保持滴答。


I want to keep a large ordered list (millions of elements) in Google App Engine datastore. Fast insertion is required.

The simplest way would be adding an indexed property (or column) "order_num" representing the order. For example, a list [A, B, C] would be stored like this:

content   order_num
--------------------
   A         1
   B         2
   C         3  

However, this doesn't give you fast insertion. For example, If I want to insert X after A, I have to renumber B and C to "make room" for X, i.e., let B become 3, C becomes 4, and X be 2. This would be a disaster if I have millions of elements.

I found a feasible solution called "gap approach" described here. This approach keeps a gap between adjacent elements. Like this:

content   order_num
--------------------
   A         1000
   B         2000
   C         3000

When I want to insert X after A, I can simply add X with its order_num set to (1000 + 2000) / 2 = 1500, no renumbering required.

But with these gaps becoming smaller, renumbering may be required. My question is, is there any known strategy on renumbering? And deciding the size of gaps?

Thanks!

UPDATE

Here's more detail. Say I have a list of elements in database, and every element has an integer property named my_num. The value of my_num is an arbitrary positive integer. Suppose I have a list [A, B, C, D], and their my_num are

 element    my_num   
---------------------
   A          5        
   B          2
   C         10
   D          7

Now, let's define an accum() operator:

accum(n) = element[0].my_num + element[1].my_num + ... + element[n-1].my_num

So the accum values for each element are

 element    my_num   accum 
----------------------------
   A          5        5
   B          2        7
   C         10       17
   D          7       24

But accum values probably should NOT be stored in database because the list is constantly updated. It's better to keep insertion fast.

I want to design a query which input is an integer x:

query(x) = element[i] if accum(i-1) < x <= accum(i)

For example, query(11) is C and query(3) is A.

Is it possible to design a datastore schema to make this query fast? Or the only way is accumulate it one by one at query time which I'm planning to do?

解决方案

alternatively, could you use decimals, or a string?

content     order
-------------------- 
   A         'a' 
   B         'b' 
   C         'c'

Then to insert D between a and b, give it the value 'aa'

An algorithm for generating the strings is best shown for a binary string: if you want to insert something between "1011" and "1100", do the following:

  • Avalue = 1+0*(1/2)+1*(1/4)+1*(1/8)
  • Bvalue = 1+1*(1/2)+0*(1/4)+0*(1/8)

average, new value = 1+0*(1/2)+1*(1/4)+1*(1/8)+1*(1/16) new string = "10111"

content     order
-------------------- 
   A         '1011' 
   new!      '10111' 
   B         '1100' 
   C         '1101'

since you always average 2 values, the average will always have a finite binary development, and a finite string. It effectively defines a binary tree.

As you know binary trees don't always turn out well balanced, in other words, some strings will be much longer than others after enough insertions. To keep them short, you could use any even number base - it has to be even because then the development of any average of two values is finite.

But whatever you do, strings will probably become long, and you'll have to do some housekeeping at some point, cleaning up the values so that the string space is used efficiently. What this algorithm gives you is the certainty that between cleanups, the system will keep ticking along.

这篇关于在数据库中存储有序列表(Gap方法)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆