在键值数据库中存储列表 [英] store list in key value database

查看:24
本文介绍了在键值数据库中存储列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我寻找在键值数据库中存储与键关联的列表的最佳方式(如 berkleydbleveldb)

例如:我有用户和用户之间的订单我想为每个用户存储订单 ID 列表,以便通过范围选择(用于分页)快速访问

如何存储这个结构?

我不想为每个用户以可序列化的格式存储它:

user_1_orders = serialize(1,2,3..)user_2_orders = 序列化(1,2,3..)

因为列表可能很长

我考虑为每个用户使用单独的 db 文件,其中包含商店订单 ID 作为键,但这并不能解决范围选择问题.如果我想获取范围为 [5000:5050]<的用户 ID,该怎么办?/代码>?

我了解 redis,但对 berkleydbleveldb 等键值实现感兴趣.

解决方案

让我们从一个列表开始.您可以使用单个哈希图:

  1. 0行存储用户的订单数
  2. 为每个新订单存储一个新行并增加计数

所以 yoru hashmap 如下所示:

键 |价值-------------0 |51 |番茄2 |芹菜3 |苹果4 |馅饼5 |肉

密钥的稳定递增确保每个密钥都是唯一的.鉴于 db 是键排序的,并且 pack 函数将整数转换为一组正确排序的字节数组,您可以获取列表的切片.要获取 5000 到 5050 之间的订单,您可以使用 bsddb Cursor.set_range 或 leveldb 的 createReadStream (js api)

现在让我们扩展到多个用户订单.如果你可以打开几个hashmap你可以使用上面使用的几个hashmap.也许您会遇到一些系统问题(打开 fds 的最大 nb 或每个目录的最大文件数).因此,您可以使用单个并为多个用户共享相同的 hashmap.

我在下面解释的内容适用于 leveldb 和 bsddb,因为您 pack 使用字典顺序(字节顺序)正确地键.所以我假设你有一个 pack 函数.在 bsddb 中,您必须自己构建一个 pack 函数.看看 wiredtiger.packingbytekey 获取灵感.

原理是使用用户的 id 来命名键.它也称为键组合.

假设您的数据库如下所示:

 键 |价值------------------1 |0 |2 <--- 用户 1 的计数列1 |1 |番茄1 |2 |橘子……32 |0 |1 <--- 用户 32 的计数列32 |1 |版纳... |...

您使用以下(伪)代码创建此数据库:

db.put(pack(1, make_uid(1)), 'tomato')db.put(pack(1, make_uid(1)), '橙色')...db.put(pack(32, make_uid(32)), 'bannana')

make_uid 实现如下所示:

def make_uid(user_uid):# 检索当前计数counter_key = pack(user_uid, 0)值 = db.get(counter_key)value += 1 # 增量# 保存新计数db.put(counter_key,值)返回值

然后您必须进行正确的范围查找,它类似于单个复合键.使用 bsddb api cursor.set_range(key) 我们检索所有项目用户 4250005050 之间:

def user_orders_slice(user_id, start, end):键,值 = cursor.set_range(pack(user_id, start))而真:user_id, order_id = unpack(key)如果 order_id >结尾:休息别的:# 值可能以某种方式打包...产值键,值 = cursor.next()

未进行错误检查.如果您从列表中删除项目,则切片 user_orders_slice(42, 5000, 5050) 不能保证撕掉 51 个项目.查询 50 项的正确方法是实现 user_orders_query(user_id, start, limit)`.

我希望你能明白.

I search for best way to store lists associated with key in key value database (like berkleydb or leveldb)

For example: I have users and orders from user to user I want to store list of orders ids for each user to fast access with range selects (for pagination)

How to store this structure?

I don't want to store it in serializable format for each user:

user_1_orders = serialize(1,2,3..)
user_2_orders = serialize(1,2,3..)

beacuse list can be long

I think about separate db file for each user with store orders ids as keys in it, but this does not solve range selects problem.. What if I want to get user ids with range [5000:5050]?

I know about redis, but interest in key value implementation like berkleydb or leveldb.

解决方案

Let start with a single list. You can work with a single hashmap:

  1. store in row 0 the count of user's order
  2. for each new order store a new row with the count incremented

So yoru hashmap looks like the following:

key | value
-------------
 0  |   5
 1  | tomato
 2  | celery
 3  | apple
 4  | pie
 5  | meat

Steady increment of the key makes sure that every key is unique. Given the fact that the db is key ordered and that the pack function translates integers into a set of byte arrays that are correctly ordered you can fetch slices of the list. To fetch orders between 5000 and 5050 you can use bsddb Cursor.set_range or leveldb's createReadStream (js api)

Now let's expand to multiple user orders. If you can open several hashmap you can use the above using several hashmap. Maybe you will hit some system issues (max nb of open fds or max num of files per directory). So you can use a single and share the same hashmap for several users.

What I explain in the following works for both leveldb and bsddb given the fact that you pack keys correctly using the lexicographic order (byteorder). So I will assume that you have a pack function. In bsddb you have to build a pack function yourself. Have a look at wiredtiger.packing or bytekey for inspiration.

The principle is to namespace the keys using the user's id. It's also called key composition.

Say you database looks like the following:

   key   |  value
-------------------
  1  | 0 |    2       <--- count column for user 1
  1  | 1 |  tomato
  1  | 2 |  orange 
    ...      ...
  32 | 0 |    1       <--- count column for user 32
  32 | 1 |  banna
    ...  |   ...

You create this database with the following (pseudo) code:

db.put(pack(1, make_uid(1)), 'tomato')
db.put(pack(1, make_uid(1)), 'orange')
...
db.put(pack(32, make_uid(32)), 'bannana')

make_uid implementation looks like this:

def make_uid(user_uid):
    # retrieve the current count
    counter_key = pack(user_uid, 0)
    value = db.get(counter_key)
    value += 1  # increment
    # save new count
    db.put(counter_key, value)
    return value

Then you have to do the correct range lookup, it's similar to the single composite-key. Using bsddb api cursor.set_range(key) we retrieve all items between 5000 and 5050 for user 42:

def user_orders_slice(user_id, start, end):
    key, value = cursor.set_range(pack(user_id, start))
    while True:
        user_id, order_id = unpack(key)
        if order_id > end:
            break
        else:
            # the value is probably packed somehow...
            yield value
            key, value = cursor.next()

Not error checks are done. Among other things slicing user_orders_slice(42, 5000, 5050) is not guaranteed to tore 51 items if you delete items from the list. A correct way to query say 50 items, is to implement a user_orders_query(user_id, start, limit)`.

I hope you get the idea.

这篇关于在键值数据库中存储列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆