组合键上的范围键查询 [英] Range Key Querying on composed keys

查看:23
本文介绍了组合键上的范围键查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我有一个包含以下字段的集合:

Currently I have a collection which contains the following fields:

  • userId
  • otherUserId
  • 日期
  • 状态

对于我的 Dynamo 集合,我使用 userId 作为 hashKey,对于 rangeKey,我想使用 date:otherUserId.通过这样做,我可以检索所有按日期排序的 userId 条目.

For my Dynamo collection I used userId as the hashKey and for the rangeKey I wanted to use date:otherUserId. By doing it like this I could retrieve all userId entries sorted on a date which is good.

但是,对于我的用例,我不应该有任何重复,这意味着我的集合中不应该有相同的 userId-otherUserId 值.这意味着我应该先进行查询以检查该情侣"是否存在,如果需要将其删除,然后再进行插入,对吗?

However, for my usecase I shouldn't have any duplicates, meaning I shouldn't have the same userId-otherUserId value in my collection. This means I should do a query first to check if that 'couple' exist, remove it if needed and then do the insert, right?

已经感谢您的帮助 :-)

Thanks for your help already :-)

我的用例的目标是在 userA 访问 userB 的个人资料时进行存储.

The goal of my usecase would be to store when userA visits the profile of userB.

现在,我想做的查询如下:

Now, The kind of queries I would like to do are the following:

  • 检索访问过 UserA 个人资料的所有 UserB,以唯一(= 没有双重 UserB)并按时间排序.
  • 检索 UserA 和 UserB 的特定对访问

推荐答案

我认为您有很多选择,但基于您的应用程序是时间感知的假设(即您想要查询最近 N 分钟、几小时、几天等的互动.

I think you have a lot of choices, but here is one that might work based on the assumption that your application is time-aware i.e. you want to query for interactions in the last N minutes, hours, days etc.

hash_key = userA
range_key = [iso1860_timestamp][1]+userB+uuid

首先,uuid 技巧是为了避免覆盖用户A 和用户B 之间的交互记录恰好同时发生(可能发生取决于您的时钟的粒度/精度).所以在插入方面我们是安全的:没有重复,没有覆盖.

First, the uuid trick is there to avoid overwriting a record of an interaction between userA and userB happening exactly at the same time (can occur depending on the granularity/precision of your clock). So insert-wise we are safe : no duplicates, no overwrites.

查询方面,事情是这样完成的:

Query-wise, here is how things get done:

  • 检索访问过 UserA 个人资料的所有 UserB,以唯一(= 没有双重 UserB)并按时间排序.

查询(hash_key=userA, range_key_condition=BEGIN(common_prefix))

其中 common_prefix = 2013-01-01,用于 2013 年 1 月的所有交互

where common_prefix = 2013-01-01 for all interactions in Jan 2013

这将检索一个时间范围内的所有记录,并排序(假设它们以正确的顺序插入).然后在 应用程序代码 中过滤它们以仅保留与 userB 相关的那些.不幸的是,DynamoDB API 不支持范围键条件列表(否则您可以通过传递额外的 CONTAINS userB 条件来节省一些时间).

This will retrieve all records in a time range, sorted (assuming they were inserted in the proper order). Then in the application code you filter them to retain only those concerning userB. Unfortunately, DynamoDB API doesn't support a list of range key conditions (otherwise you could just save some time by passing an additional CONTAINS userB condition).

  • 检索 UserA 和 UserB 的特定对访问

查询(hash_key=userA, range_key_condition=BEGINS(common_prefix))

如果我们可以假设您知道交互的时间戳,那么 common_prefix 可能会更精确.

where common_prefix could be much more precise if we can assume you know the timestamp of the interaction.

当然,应该根据您将处理的数据流的属性来评估这种设计.如果您可以(大多数情况下)为您的查询指定一个有意义的时间范围,那么它将很快并且受您在用户A 的时间范围内记录的交互次数的限制.

Of course, this design should be evaluated wrt to the properties of the data stream you will handle. If you can (most often) specify a meaningful time range for your queries, it will be fast and bounded by the number of interactions you have recorded in the time range for userA.

如果您的应用程序不是那么以时间为导向 - 我们可以假设用户通常只有少量交互 - 您可能会切换到以下架构:

If your application is not so time-oriented - and we can assume a user have most often only a few interactions - you might switch to the following schema:

hash_key = userA
range_key = userB+[iso1860_timestamp][1]+uuid

这样可以按用户查询:

查询(hash_key=userA, range_key_condition=BEGIN(userB))

此替代方案将很快并且受限于用户A - 用户B 交互的次数在所有时间范围内,这可能是有意义的,具体取决于您的应用程序.

This alternative will be fast and bounded by the nber of userA - userB interactions over all time ranges, which could be meaningful depending on your application.

因此,基本上您应该检查示例数据并估计哪个方向对您的应用程序有意义.两个方向(时间或用户)也可以通过在其他表中手动创建和维护索引来加速 - 以更复杂的应用程序代码为代价.

So basically you should check example data and estimate which orientation is meaningful for your application. Both orientations (time or user) might also be sped up by manually creating and maintaining indexes in other tables - at the cost of a more complex application code.

(历史版本:避免使用基于时间的键覆盖记录的技巧)在您的情况下,一个常见的技巧是使用生成的唯一 id (uuid) 对范围键进行后缀.这样您仍然可以使用 BETWEEN 条件进行 query 调用以检索在给定时间段内插入的记录,并且您无需担心插入时的键冲突时间.

(historical version: trick to avoid overwriting records with time-based keys) A common trick in your case is to postfix the range key with a generated unique id (uuid). This way you can still do query calls with BETWEEN condition to retrieve records that were inserted in a given time period, and you don't need to worry about key collision at insertion time.

这篇关于组合键上的范围键查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆