DynamoDB中具有超过2列的组合键? [英] Composite key in DynamoDB with more than 2 columns?

查看:140
本文介绍了DynamoDB中具有超过2列的组合键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究正在使用的应用程序中DynamoDB的使用,该应用程序目前只有一个数据库组件-运行在RDS上的MySQL数据库。

I'm exploring the use of DynamoDB in the application I work on, which currently only has one database component -- a MySQL database running on RDS.

相当多地使用AWS并为我们的数据库使用分片方案,但这只能使我们走到目前为止,无需人工干预。与Aurora一起玩时,我实际上发现与MySQL数据库相比,性能显着下降,因此我正在评估DynamoDB,以确保它对我们有用,因为它可以有效地存储JSON数据,并且易于扩展(只需增加读取或写入操作即可)

We pretty heavily use AWS and use a sharding scheme for our databases, but it can only get us so far without manual intervention. Playing around with Aurora I actually saw a significant drop in performance vs our MySQL database, so I'm evaluating DynamoDB to see it will work for us, as it can efficiently store JSON data, and also scale easily (just increase the reads or writes per second in the AWS console and let Amazon do the heavy lifting).

在我们的几个MySQL表中,我们有一个主键,它是一个自动增量列,但我们也除此之外,还有几个索引可以通过其他方式支持查询性能。其他索引至关重要,因为我们的某些表中有超过10亿行。本质上,我们在客户端,object_name等之间进行分区。所以我可能会在MySQL中执行以下操作:

In several of our MySQL tables we have a primary key that is an autoincrement column, but we also have several indices on top of that to support query performance in other ways. The other indices are crucial as some of our tables have over 1 billion rows in them. In essence, we partition things among a client, an object_name, etc. So I might do something like this in MySQL:

Create Table: CREATE TABLE `record` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `client_id` int(10) unsigned NOT NULL,
  `data_id_start` bigint(20) unsigned NOT NULL,
  `data_id_end` bigint(20) unsigned NOT NULL DEFAULT '8888888888888888',
  `object_name` varchar(255) NOT NULL,
  `uuid` varchar(255) NOT NULL,
  `deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
  ...
  PRIMARY KEY (`id`),
  ...
  KEY `client_id_object_name_data_id_data_id_end_deleted` (`client_id`,`object_name`,`data_id_start`,`data_id_end`,`deleted`),
  KEY `client_id_object_name_data_id_end_uuid_id` (`client_id`,`object_name`,`data_id_end`,`uuid_id`),
  ...
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

我正在评估将部分数据复制到DynamoDB中以用作缓存,所以我们不在某些情况下,不必去S3那里检索存储的数据。相反,我将直接将JSON数据存储在缓存中。在DynamoDB中,看起来我可以在键中使用HASH或HASH和RANGE属性。因此,例如,我可以将MySQL表中的autoincrement列用作HASH,但随后看到的所有示例中的RANGE键,全局/本地二级索引等都仅指定 ONE 其他属性作为范围。当在 where子句中指定3个或更多值时,我想创建一个索引以进行有效查找。

I'm evaluating duplicating some of this data into DynamoDB to use as a cache, so we don't have to go out to S3 to retrieve stored data there under certain situations. Instead, I'll just store the JSON data directly in the cache. In DynamoDB, it looks like I could use a HASH or a HASH and RANGE attribute in a key. So for example, I could use the autoincrement column from our MySQL table as the HASH, but then all of the examples I see of RANGE keys, global/local secondary indices, etc. only specify ONE other attribute as the RANGE. I want to create an index for efficient lookup when 3 or more values are specified in the "where" clause.

例如,我想使用像这样的表达式:

For example, I would like to query this table using an expression like this:

var params = {
    TableName: "Cache",
    KeyConditionExpression: "clientId = :clientId and objectName = :objectName and uuid = :uuid",
    ExpressionAttributeValues: {
        ":clientId": 17,
        ":objectName": "Some name",
        ":uuid": "ABC123-KDJFK3244-CCB"
    }
};

注意,我在KeyConditionExpression中的 where子句使用3个值。我们可能在那里有4或5个值。那么DynamoDB中有什么方法可以创建其中具有两个以上属性(列)的复合键?

Notice that my "where clause" in the KeyConditionExpression uses 3 values. It's possible that we might have 4 or 5 values there. So is there any way in DynamoDB to create composite keys that have more than 2 attributes (columns) in them?

如果没有,我想我可以串联3列成一个字符串,并在每次插入时将其用作我的主键。或者至少将clientId和objectName连接起来,然后将uuid用作RANGE或类似的名称。实际上,我需要分页浏览特定clientId / objectName组合的所有值,然后基于每行中的某些属性,直接从缓存中获取其值,或者将其视为未命中并从S3中检索值(这是

If not, I suppose that I could concatenate the 3 columns into a string and use that as my primary key on each insert. Or at least concatenate clientId and objectName, then use uuid as a RANGE or something like that. Effectively I need to page through all values for a specific clientId/objectName combination, and then based on some of the attributes in each row either take its value directly from the cache, or consider it a miss and retrieve the value from S3 (which is considerably slower).

推荐答案

DynamoDB允许在本质上无限量的数据上进行一致的低延迟查询。您建议的连接值的模型似乎是一个好方法。

DynamoDB allows consistent low-latency queries on essentially infinite amount of data for this. The model you suggested with concatenating the values seems to be a good approach.

需要注意的一件事是,哈希键属性值限制为2048个字节。如果您要连接的值不是可预测的长度(不能很好地填充它们)或超过此限制,则这是对项目的值进行散列并根据项目的散列进行搜索的更好方法。以下是有关限制的相关文档: https://docs.aws.amazon .com / amazondynamodb / latest / developerguide / Limits.html 。 DynamoDB项的总数据也限制为400KB。

One thing to note is that hash key attribute values are limited to 2048 bytes. If the values you are concatenating are not predictable lengths (you can't pad them nicely) or exceed this limit, it may be a better approach to hash the value of the item and search based on the hash of the item. Here is the relevant documentation on limits: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html. DynamoDB items are also limited to 400KB total data.

为了正确起见,我还将对范围键使用一些唯一的标识符,这将允许哈希值发生冲突(即使(这种情况很少见),并且该模式具有可伸缩性,因为每个哈希键值只有少量项。

For correctness, I would also use some unique identifier for a range key, this will allow collisions for hash values (even if it's rare) and the schema is scalable because there are a small number of items per hash key value.

这篇关于DynamoDB中具有超过2列的组合键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆