范围键查询组合键 [英] Range Key Querying on composed keys

查看:70
本文介绍了范围键查询组合键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我有一个包含以下字段的集合:

Currently I have a collection which contains the following fields:


  • userId

  • otherUserId

  • date

  • 状态

  • userId
  • otherUserId
  • date
  • status

对于我Dynamo集合中,我将userId用作 hashKey ,对于 rangeKey 我想使用date:otherUserId。这样,我可以检索按日期排序的所有userId条目。

For my Dynamo collection I used userId as the hashKey and for the rangeKey I wanted to use date:otherUserId. By doing it like this I could retrieve all userId entries sorted on a date which is good.

但是,对于我的用例,我不应有任何重复项,这意味着我的收藏集中没有相同的 userId-otherUserId 值。这意味着我应该先执行查询以检查该对是否存在,如果需要将其删除,然后执行插入操作,对吗?

However, for my usecase I shouldn't have any duplicates, meaning I shouldn't have the same userId-otherUserId value in my collection. This means I should do a query first to check if that 'couple' exist, remove it if needed and then do the insert, right?

编辑:

感谢您的帮助:-)

我的用例的目标是在用户A访问用户B的个人资料时进行存储。

The goal of my usecase would be to store when userA visits the profile of userB.

现在,我想执行的查询如下:

Now, The kind of queries I would like to do are the following:


  • 检索访问过UserA资料的所有UserB,它们以唯一的方式(=没有双重UserB)并按时间排序。

  • 检索UserA和UserB的特定配对访问

推荐答案

我认为您有很多选择,但这是根据假设您的应用程序是时间感知的,即您想查询最近N分钟,几小时,几天之内的互动情况。

I think you have a lot of choices, but here is one that might work based on the assumption that your application is time-aware i.e. you want to query for interactions in the last N minutes, hours, days etc.

hash_key = userA
range_key = [iso1860_timestamp][1]+userB+uuid

首先, uuid技巧是为了避免覆盖userA和userB之间完全同时发生的交互记录(可能会发生,具体取决于时钟的粒度/精度)。因此,在插入方式上我们是安全的:没有重复项,没有覆盖。

First, the uuid trick is there to avoid overwriting a record of an interaction between userA and userB happening exactly at the same time (can occur depending on the granularity/precision of your clock). So insert-wise we are safe : no duplicates, no overwrites.

从查询角度来看,这是完成的方式:

Query-wise, here is how things get done:



  • 检索访问过UserA资料的所有UserB,它们都是唯一的(=否为Double UserB),并按时间排序。

query(hash_key = userA,range_key_condition = BEGIN(common_prefix))

其中,对于2013年1月的所有互动, common_prefix = 2013-01-01

where common_prefix = 2013-01-01 for all interactions in Jan 2013

这将检索一个时间范围内的所有记录,并对其进行排序(假设它们以正确的顺序插入)。然后在应用程序代码中过滤它们以仅保留那些与userB有关的内容。不幸的是,DynamoDB API不支持范围键条件列表(否则,您可以通过传递其他CONTAINS userB条件来节省时间)。

This will retrieve all records in a time range, sorted (assuming they were inserted in the proper order). Then in the application code you filter them to retain only those concerning userB. Unfortunately, DynamoDB API doesn't support a list of range key conditions (otherwise you could just save some time by passing an additional CONTAINS userB condition).



  • 检索UserA和UserB的特定配对访问

query(hash_key = userA,range_key_condition = BEGINS(common_prefix))

其中 common_prefix 如果可以假设您知道交互的时间戳记,则可能更为精确。

where common_prefix could be much more precise if we can assume you know the timestamp of the interaction.

当然,应根据属性对这种设计进行评估您将处理的数据流。如果您(通常)可以为查询指定有意义的时间范围,则该时间范围将很快并且受您在该时间范围内为userA记录的互动次数的限制。

Of course, this design should be evaluated wrt to the properties of the data stream you will handle. If you can (most often) specify a meaningful time range for your queries, it will be fast and bounded by the number of interactions you have recorded in the time range for userA.

如果您的应用程序不太注重时间-并且我们可以假设用户最经常只有很少的交互-您可以切换到以下模式:

If your application is not so time-oriented - and we can assume a user have most often only a few interactions - you might switch to the following schema:

hash_key = userA
range_key = userB+[iso1860_timestamp][1]+uuid

这种方式可以按用户查询:

This way you can query by user:

query(hash_key = userA,range_key_condition = BEGIN(userB))

此替代方法很快且受用户A的限制-用户B交互在所有时间范围内,这可能对您的应用程序有意义。

This alternative will be fast and bounded by the nber of userA - userB interactions over all time ranges, which could be meaningful depending on your application.

因此,基本上,您应该检查示例数据并估算哪种方向对您的应用程序有意义。通过在其他表中手动创建和维护索引也可以加快两种方向(时间或用户)的使用,但是这会花费更复杂的应用程序代码。

So basically you should check example data and estimate which orientation is meaningful for your application. Both orientations (time or user) might also be sped up by manually creating and maintaining indexes in other tables - at the cost of a more complex application code.

(历史版本:避免使用基于时间的键覆盖记录的技巧)
情况是用生成的唯一ID( uuid )后缀范围键。这样,您仍然可以使用 BETWEEN 条件执行 query 调用来检索在给定时间段内插入的记录,并且您不必担心插入时发生键冲突。

(historical version: trick to avoid overwriting records with time-based keys) A common trick in your case is to postfix the range key with a generated unique id (uuid). This way you can still do query calls with BETWEEN condition to retrieve records that were inserted in a given time period, and you don't need to worry about key collision at insertion time.

这篇关于范围键查询组合键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆