在键值数据库中存储列表 [英] store list in key value database
问题描述
我寻找在键值数据库中存储与键关联的列表的最佳方式(如 berkleydb
或 leveldb
)
例如:我有用户和用户之间的订单我想为每个用户存储订单 ID 列表,以便通过范围选择(用于分页)快速访问
如何存储这个结构?
我不想为每个用户以可序列化的格式存储它:
user_1_orders = serialize(1,2,3..)user_2_orders = 序列化(1,2,3..)
因为列表可能很长p>
我考虑为每个用户使用单独的 db 文件,其中包含商店订单 ID 作为键,但这并不能解决范围选择问题.如果我想获取范围为 [5000:5050]<的用户 ID,该怎么办?/代码>?
我了解 redis
,但对 berkleydb
或 leveldb
等键值实现感兴趣.
让我们从一个列表开始.您可以使用单个哈希图:
- 在
0
行存储用户的订单数 - 为每个新订单存储一个新行并增加计数
所以 yoru hashmap 如下所示:
键 |价值-------------0 |51 |番茄2 |芹菜3 |苹果4 |馅饼5 |肉
密钥的稳定递增确保每个密钥都是唯一的.鉴于 db 是键排序的,并且 pack 函数将整数转换为一组正确排序的字节数组,您可以获取列表的切片.要获取 5000 到 5050 之间的订单,您可以使用 bsddb Cursor.set_range
或 leveldb 的 createReadStream
(js api)
现在让我们扩展到多个用户订单.如果你可以打开几个hashmap你可以使用上面使用的几个hashmap.也许您会遇到一些系统问题(打开 fds 的最大 nb 或每个目录的最大文件数).因此,您可以使用单个并为多个用户共享相同的 hashmap.
我在下面解释的内容适用于 leveldb 和 bsddb,因为您 pack
使用字典顺序(字节顺序)正确地键.所以我假设你有一个 pack
函数.在 bsddb 中,您必须自己构建一个 pack
函数.看看 wiredtiger.packing
或 bytekey 获取灵感.
原理是使用用户的 id 来命名键.它也称为键组合.
假设您的数据库如下所示:
键 |价值------------------1 |0 |2 <--- 用户 1 的计数列1 |1 |番茄1 |2 |橘子……32 |0 |1 <--- 用户 32 的计数列32 |1 |版纳... |...
您使用以下(伪)代码创建此数据库:
db.put(pack(1, make_uid(1)), 'tomato')db.put(pack(1, make_uid(1)), '橙色')...db.put(pack(32, make_uid(32)), 'bannana')
make_uid
实现如下所示:
def make_uid(user_uid):# 检索当前计数counter_key = pack(user_uid, 0)值 = db.get(counter_key)value += 1 # 增量# 保存新计数db.put(counter_key,值)返回值
然后您必须进行正确的范围查找,它类似于单个复合键.使用 bsddb api cursor.set_range(key)
我们检索所有项目用户 42
的 5000
和 5050
之间:
def user_orders_slice(user_id, start, end):键,值 = cursor.set_range(pack(user_id, start))而真:user_id, order_id = unpack(key)如果 order_id >结尾:休息别的:# 值可能以某种方式打包...产值键,值 = cursor.next()
未进行错误检查.如果您从列表中删除项目,则切片 user_orders_slice(42, 5000, 5050)
不能保证撕掉 51 个项目.查询 50
项的正确方法是实现 user_orders_query(user_id, start, limit)`.
我希望你能明白.
I search for best way to store lists associated with key in key value database (like berkleydb
or leveldb
)
For example: I have users and orders from user to user I want to store list of orders ids for each user to fast access with range selects (for pagination)
How to store this structure?
I don't want to store it in serializable format for each user:
user_1_orders = serialize(1,2,3..)
user_2_orders = serialize(1,2,3..)
beacuse list can be long
I think about separate db file for each user with store orders ids as keys in it, but this does not solve range selects problem.. What if I want to get user ids with range [5000:5050]
?
I know about redis
, but interest in key value implementation like berkleydb
or leveldb
.
Let start with a single list. You can work with a single hashmap:
- store in row
0
the count of user's order - for each new order store a new row with the count incremented
So yoru hashmap looks like the following:
key | value
-------------
0 | 5
1 | tomato
2 | celery
3 | apple
4 | pie
5 | meat
Steady increment of the key makes sure that every key is unique. Given the fact that the db is key ordered and that the pack function translates integers into a set of byte arrays that are correctly ordered you can fetch slices of the list. To fetch orders between 5000 and 5050 you can use bsddb Cursor.set_range
or leveldb's createReadStream
(js api)
Now let's expand to multiple user orders. If you can open several hashmap you can use the above using several hashmap. Maybe you will hit some system issues (max nb of open fds or max num of files per directory). So you can use a single and share the same hashmap for several users.
What I explain in the following works for both leveldb and bsddb given the fact that you pack
keys correctly using the lexicographic order (byteorder). So I will assume that you have a pack
function. In bsddb you have to build a pack
function yourself. Have a look at wiredtiger.packing
or bytekey for inspiration.
The principle is to namespace the keys using the user's id. It's also called key composition.
Say you database looks like the following:
key | value
-------------------
1 | 0 | 2 <--- count column for user 1
1 | 1 | tomato
1 | 2 | orange
... ...
32 | 0 | 1 <--- count column for user 32
32 | 1 | banna
... | ...
You create this database with the following (pseudo) code:
db.put(pack(1, make_uid(1)), 'tomato')
db.put(pack(1, make_uid(1)), 'orange')
...
db.put(pack(32, make_uid(32)), 'bannana')
make_uid
implementation looks like this:
def make_uid(user_uid):
# retrieve the current count
counter_key = pack(user_uid, 0)
value = db.get(counter_key)
value += 1 # increment
# save new count
db.put(counter_key, value)
return value
Then you have to do the correct range lookup, it's similar to the single composite-key. Using bsddb api cursor.set_range(key)
we retrieve all items
between 5000
and 5050
for user 42
:
def user_orders_slice(user_id, start, end):
key, value = cursor.set_range(pack(user_id, start))
while True:
user_id, order_id = unpack(key)
if order_id > end:
break
else:
# the value is probably packed somehow...
yield value
key, value = cursor.next()
Not error checks are done. Among other things slicing user_orders_slice(42, 5000, 5050)
is not guaranteed to tore 51 items if you delete items from the list. A correct way to query say 50
items, is to implement a user_orders_query(user_id, start, limit)`.
I hope you get the idea.
这篇关于在键值数据库中存储列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!