DynamoDB 中超过 2 列的复合键? [英] Composite key in DynamoDB with more than 2 columns?

查看:11
本文介绍了DynamoDB 中超过 2 列的复合键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索在我工作的应用程序中使用 DynamoDB,该应用程序目前只有一个数据库组件——在 RDS 上运行的 MySQL 数据库.

I'm exploring the use of DynamoDB in the application I work on, which currently only has one database component -- a MySQL database running on RDS.

我们大量使用 AWS 并为我们的数据库使用分片方案,但它只能让我们在没有人工干预的情况下走到这一步.在使用 Aurora 时,我实际上看到与我们的 MySQL 数据库相比性能显着下降,因此我正在评估 DynamoDB 以了解它是否适​​合我们,因为它可以有效地存储 JSON 数据,并且还可以轻松扩展(只需增加读取或写入每秒在 AWS 控制台中,让 Amazon 完成繁重的工作).

We pretty heavily use AWS and use a sharding scheme for our databases, but it can only get us so far without manual intervention. Playing around with Aurora I actually saw a significant drop in performance vs our MySQL database, so I'm evaluating DynamoDB to see it will work for us, as it can efficiently store JSON data, and also scale easily (just increase the reads or writes per second in the AWS console and let Amazon do the heavy lifting).

在我们的几个 MySQL 表中,我们有一个主键,它是一个自动增量列,但我们还有几个索引以支持其他方式的查询性能.其他索引至关重要,因为我们的一些表中有超过 10 亿行.本质上,我们在客户端、object_name 等之间进行划分.所以我可能会在 MySQL 中做这样的事情:

In several of our MySQL tables we have a primary key that is an autoincrement column, but we also have several indices on top of that to support query performance in other ways. The other indices are crucial as some of our tables have over 1 billion rows in them. In essence, we partition things among a client, an object_name, etc. So I might do something like this in MySQL:

Create Table: CREATE TABLE `record` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `client_id` int(10) unsigned NOT NULL,
  `data_id_start` bigint(20) unsigned NOT NULL,
  `data_id_end` bigint(20) unsigned NOT NULL DEFAULT '8888888888888888',
  `object_name` varchar(255) NOT NULL,
  `uuid` varchar(255) NOT NULL,
  `deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
  ...
  PRIMARY KEY (`id`),
  ...
  KEY `client_id_object_name_data_id_data_id_end_deleted` (`client_id`,`object_name`,`data_id_start`,`data_id_end`,`deleted`),
  KEY `client_id_object_name_data_id_end_uuid_id` (`client_id`,`object_name`,`data_id_end`,`uuid_id`),
  ...
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

我正在评估将其中的一些数据复制到 DynamoDB 中以用作缓存,因此在某些情况下我们不必去 S3 检索存储的数据.相反,我将直接将 JSON 数据存储在缓存中.在 DynamoDB 中,看起来我可以在键中使用 HASH 或 HASH 和 RANGE 属性.例如,我可以使用 MySQL 表中的自动增量列作为 HASH,但是我看到的所有 RANGE 键、全局/本地二级索引等示例仅指定 ONE 其他属性作为范围.当where"子句中指定了 3 个或更多值时,我想创建一个索引以进行高效查找.

I'm evaluating duplicating some of this data into DynamoDB to use as a cache, so we don't have to go out to S3 to retrieve stored data there under certain situations. Instead, I'll just store the JSON data directly in the cache. In DynamoDB, it looks like I could use a HASH or a HASH and RANGE attribute in a key. So for example, I could use the autoincrement column from our MySQL table as the HASH, but then all of the examples I see of RANGE keys, global/local secondary indices, etc. only specify ONE other attribute as the RANGE. I want to create an index for efficient lookup when 3 or more values are specified in the "where" clause.

例如,我想使用如下表达式查询此表:

For example, I would like to query this table using an expression like this:

var params = {
    TableName: "Cache",
    KeyConditionExpression: "clientId = :clientId and objectName = :objectName and uuid = :uuid",
    ExpressionAttributeValues: {
        ":clientId": 17,
        ":objectName": "Some name",
        ":uuid": "ABC123-KDJFK3244-CCB"
    }
};

请注意,我在 KeyConditionExpression 中的where 子句"使用了 3 个值.那里可能有 4 或 5 个值.那么在 DynamoDB 中是否有任何方法可以创建包含超过 2 个属性(列)的复合键?

Notice that my "where clause" in the KeyConditionExpression uses 3 values. It's possible that we might have 4 or 5 values there. So is there any way in DynamoDB to create composite keys that have more than 2 attributes (columns) in them?

如果不是,我想我可以将 3 列连接成一个字符串,并将其用作每次插入时的主键.或者至少连接 clientId 和 objectName,然后使用 uuid 作为 RANGE 或类似的东西.实际上,我需要翻阅特定 clientId/objectName 组合的所有值,然后根据每行中的某些属性直接从缓存中获取其值,或者将其视为未命中并从 S3 检索值(即慢得多).

If not, I suppose that I could concatenate the 3 columns into a string and use that as my primary key on each insert. Or at least concatenate clientId and objectName, then use uuid as a RANGE or something like that. Effectively I need to page through all values for a specific clientId/objectName combination, and then based on some of the attributes in each row either take its value directly from the cache, or consider it a miss and retrieve the value from S3 (which is considerably slower).

推荐答案

为此,DynamoDB 允许对基本上无限量的数据进行一致的低延迟查询.您建议的连接值的模型似乎是一个好方法.

DynamoDB allows consistent low-latency queries on essentially infinite amount of data for this. The model you suggested with concatenating the values seems to be a good approach.

需要注意的一点是,哈希键属性值限制为 2048 字节.如果您要连接的值不是可预测的长度(您不能很好地填充它们)或超过此限制,那么对项目的值进行散列并根据项目的散列进行搜索可能是一种更好的方法.以下是有关限制的相关文档:https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html.DynamoDB 项目的总数据也限制为 400KB.

One thing to note is that hash key attribute values are limited to 2048 bytes. If the values you are concatenating are not predictable lengths (you can't pad them nicely) or exceed this limit, it may be a better approach to hash the value of the item and search based on the hash of the item. Here is the relevant documentation on limits: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html. DynamoDB items are also limited to 400KB total data.

为了正确起见,我还将为范围键使用一些唯一标识符,这将允许哈希值发生冲突(即使它很少见),并且架构是可扩展的,因为每个哈希键值有少量项目.

For correctness, I would also use some unique identifier for a range key, this will allow collisions for hash values (even if it's rare) and the schema is scalable because there are a small number of items per hash key value.

这篇关于DynamoDB 中超过 2 列的复合键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆