在 DynamoDB 上检索以指定文本开头的列的所有项目 [英] Retrieve all items with a column beginning with specified text on DynamoDB

查看:10
本文介绍了在 DynamoDB 上检索以指定文本开头的列的所有项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 DynamoDB 中有一个表:

I have a table in DynamoDB:

Id: int, hash key
Name: string

(还有很多列,但我省略了)

通常我只是通过 ID 提取和更新项目,这个架构可以很好地解决这个问题.

Typically I just pull out and update items by their Id, and this schema works fine for that.

但是,要求之一是要有一个基于名称的自动完成下拉框.我希望能够在此 DynamoDB 表中查询以查询字符串开头的名称列的所有项目.

However, one of the requirements is to have an auto-completing drop down box based on the name. I want to be able to query all items in this DynamoDB table for Name columns starting with a query string.

解决这个问题的 SQL 方法是只在 Name 上添加一个索引并编写一个查询,如 SELECT Id FROM table WHERE Name LIKE 'query%',但我无法弄清楚这样做的方式对 DynamoDB 友好.

The SQL way of solving this would be to just add an index on Name and write a query like SELECT Id FROM table WHERE Name LIKE 'query%', but I can't figure out a DynamoDB-friendly way of doing this.

我考虑了几种方法来解决这个问题:

I have considered a few ways to solve this:

  1. 扫描桌子.这是最简单的选择,但效率最低.此表中的数据比我经常扫描的要多一些.
  2. 扫描+缓存在内存中.但是我不得不担心缓存失效等问题.
  3. 使 Name 成为范围键,它支持查询上的 begins_with 函数.但是,我仍然需要扫描表,因为我想检索每个哈希键的结果,所以这实际上不起作用.
  4. 建立一个全局二级索引,只用范围键查询.这似乎也不可能.我可以有一个具有静态值的列并将其用作 GSI 的哈希键,但这似乎是一个非常丑陋的 hack.
  5. 使用 CloudSearch 之类的全文搜索引擎,但这对我的用例来说似乎有点过头了.
  1. Scan the table. This is the easiest option, but least efficient. There's a bit more data in this table than I would be comfortable frequently scanning.
  2. Scan + cache it in memory. But then I have to worry about cache invalidation etc.
  3. Make Name a range key, which supports a begins_with function on the query. However, I'd still have to Scan the table since I want to retrieve results for every single hash key, so this doesn't really work.
  4. Make a global secondary index and query it only with the range key. This also doesn't appear to be possible. I could have a column with a static value and use that as the hash key for the GSI, but that seems like a really ugly hack.
  5. Use a full text search engine like CloudSearch, but this seems like massive overkill for my use case.

这个问题有简单的解决方案吗?

Is there a simple solution to this issue?

推荐答案

DynamoDB 目前的 Query 操作不直接支持您描述的用例 - DynamoDB 通常要求您指定一个 hashkey,然后相应地查询 range 键.

The use case you described is not directly supported by DynamoDB's Query operation today - DynamoDB typically requires you to specify a hashkey then query on the range key accordingly.

但是,有一种流行的分散收集技术,通常用于您的用例.在这种情况下,您将添加一个属性 bucket_id 并创建一个全局二级索引,其中 bucket_id 作为哈希键,Name 作为范围键.

However, there is a popular scatter-gather technique that is commonly used for usecase such as yours. In this case, you would add an attribute bucket_id and create a global secondary index with bucket_id as hash key, and Name as the range key.

bucket_id 指的是固定范围的 ID 或数字,具有足够的基数以确保您的全局二级索引分布良好.例如,bucket_id 的范围可以从 0 到 99.然后在更新基表时,每当添加新条目时,都会为其分配一个介于 0 到 99 之间的随机 bucket_id.

The bucket_id refers to a fixed range of IDs or numbers, with enough cardinality to ensure your global secondary index is well-distributed. For instance, bucket_id could range from 0 to 99. Then when updating your base table, whenever a new entry is added, a random bucket_id between 0 and 99 is assigned to it.

在您的自动完成查询期间,应用程序将为每个 bucket_id 值(0 到 99)发送 100 个单独的查询(分散),并在范围键名称上使用 BEGINS_WITH.检索结果后,应用程序必须合并 100 组响应并根据需要重新排序(收集).

During your autocomplete query, the application would send 100 separate queries (scatter) for each bucket_id value (0 to 99) and use BEGINS_WITH on the range key Name. After the results are retrieved, the application would have to combine the 100 sets of responses and re-sort as necessary (gather).

上述过程可能看起来有点繁琐,但通过确保负载均匀分布在固定键范围内,它可以让您的系统/表很好地扩展.您可以酌情增加 bucket_id 范围.为了节省成本,您可以选择将 KEYS_ONLY 投影到您的全局二级索引上,从而最大限度地降低查询成本.

The above process may seem a bit cumbersome, but it allows your system/table to scale well by ensuring the load is evenly distributed over a fixed key range. You can increase the bucket_id range as appropriate. To save cost, you can choose to project KEYS_ONLY onto your global secondary index, so cost of querying is minimized.

这篇关于在 DynamoDB 上检索以指定文本开头的列的所有项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆