在DynamoDB上检索以指定文本开头的列的所有项目 [英] Retrieve all items with a column beginning with specified text on DynamoDB

查看：77 发布时间：2020/6/4 0:24:42 amazon-dynamodb

本文介绍了在DynamoDB上检索以指定文本开头的列的所有项目的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在DynamoDB中有一个表：

  Id：int，哈希键
名称：字符串

（还有更多列，但我省略了）

通常，我只是通过其ID提取并更新项目，这种模式对此很好用。

但是，其中之一要求是有一个基于名称的自动填充下拉框。我希望能够在此DynamoDB表中的所有项目中查询以查询字符串开头的Name列。

解决此问题的SQL方法是仅在Name上添加索引，然后编写类似 SELECT Id FROM table WHERE Name LIKE'query％'，但我想不出一种DynamoDB友好的方式。

我已经考虑了几种解决方法：

扫描表格。这是最简单的选择，但效率最低。此表中的数据比我经常扫描所需的数据要多。

扫描并将其缓存在内存中。但是然后，我不得不担心缓存失效等。

命名一个范围键，该键支持查询中的 begins_with 函数。但是，由于我想为每个哈希键检索结果，因此仍然需要扫描表，所以这实际上不起作用。

创建全局二级索引并对其进行查询仅使用范围键。这似乎也不可能。我可以有一个带有静态值的列，并将其用作GSI的哈希键，但这似乎是一个非常丑陋的黑客。

使用像CloudSearch这样的全文本搜索引擎，但是对于我的用例来说，这似乎是过大的杀伤力。
解决方案
DynamoDB的Query操作今天不直接支持您描述的用例-DynamoDB通常要求您指定一个hashkey然后进行查询按范围键。

但是，有一种流行的分散收集技术通常用于诸如您的用例。在这种情况下，您将添加属性 bucket_id 并使用 bucket_id 作为哈希键创建全局二级索引，并名称作为范围键。

bucket_id指的是固定范围的ID或数字，具有足够的基数以确保您的全局二级索引分布良好。例如， bucket_id 的范围可以从0到99。然后在更新基表时，每当添加新条目时，随机的 bucket_id 介于0到99之间。

在自动完成查询期间，应用程序将为每个bucket_id值（0至99），并在范围键名称上使用 BEGINS_WITH 。检索结果后，应用程序将必须组合100组响应并根据需要重新排序（收集）。

上面的过程似乎有点麻烦，但是通过确保负载均匀分布在固定键范围内，它可以使系统/表很好地扩展。您可以适当增加bucket_id的范围。为了节省成本，您可以选择将 KEYS_ONLY 投影到全局二级索引上，以使查询成本最小化。

I have a table in DynamoDB:
```
Id: int, hash key
Name: string
```
(there are many more columns, but I omitted them)

Typically I just pull out and update items by their Id, and this schema works fine for that.

However, one of the requirements is to have an auto-completing drop down box based on the name. I want to be able to query all items in this DynamoDB table for Name columns starting with a query string.

The SQL way of solving this would be to just add an index on Name and write a query like SELECT Id FROM table WHERE Name LIKE 'query%', but I can't figure out a DynamoDB-friendly way of doing this.

I have considered a few ways to solve this:
1. Scan the table. This is the easiest option, but least efficient. There's a bit more data in this table than I would be comfortable frequently scanning.
2. Scan + cache it in memory. But then I have to worry about cache invalidation etc.
3. Make Name a range key, which supports a begins_with function on the query. However, I'd still have to Scan the table since I want to retrieve results for every single hash key, so this doesn't really work.
4. Make a global secondary index and query it only with the range key. This also doesn't appear to be possible. I could have a column with a static value and use that as the hash key for the GSI, but that seems like a really ugly hack.
5. Use a full text search engine like CloudSearch, but this seems like massive overkill for my use case.
Is there a simple solution to this issue?
解决方案
The use case you described is not directly supported by DynamoDB's Query operation today - DynamoDB typically requires you to specify a hashkey then query on the range key accordingly.

However, there is a popular scatter-gather technique that is commonly used for usecase such as yours. In this case, you would add an attribute bucket_id and create a global secondary index with bucket_id as hash key, and Name as the range key.

The bucket_id refers to a fixed range of IDs or numbers, with enough cardinality to ensure your global secondary index is well-distributed. For instance, bucket_id could range from 0 to 99. Then when updating your base table, whenever a new entry is added, a random bucket_id between 0 and 99 is assigned to it.

During your autocomplete query, the application would send 100 separate queries (scatter) for each bucket_id value (0 to 99) and use BEGINS_WITH on the range key Name. After the results are retrieved, the application would have to combine the 100 sets of responses and re-sort as necessary (gather).

The above process may seem a bit cumbersome, but it allows your system/table to scale well by ensuring the load is evenly distributed over a fixed key range. You can increase the bucket_id range as appropriate. To save cost, you can choose to project KEYS_ONLY onto your global secondary index, so cost of querying is minimized.

这篇关于在DynamoDB上检索以指定文本开头的列的所有项目的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在DynamoDB上检索以指定文本开头的列的所有项目 [英] Retrieve all items with a column beginning with specified text on DynamoDB

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在DynamoDB上检索以指定文本开头的列的所有项目 [英] Retrieve all items with a column beginning with specified text on DynamoDB

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭