单个分区键值的 DynamoDB 最大分区大小是否为 10GB? [英] Is there a DynamoDB max partition size of 10GB for a single partition key value?

查看:18
本文介绍了单个分区键值的 DynamoDB 最大分区大小是否为 10GB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了很多关于设计分区键和排序键的 DynamoDB 文档,但我认为我一定遗漏了一些基本的东西.

I've read lots of DynamoDB docs on designing partition keys and sort keys, but I think I must be missing something fundamental.

如果您的分区键设计错误,当 SINGLE 分区键值的数据超过 10GB 时会发生什么?

If you have a bad partition key design, what happens when the data for a SINGLE partition key value exceeds 10GB?

了解分区行为"部分指出:

The 'Understand Partition Behaviour' section states:

一个分区可以容纳大约 10 GB 的数据"

"A single partition can hold approximately 10 GB of data"

如何对单个分区键进行分区?

How can it partition a single partition key?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions

文档还讨论了本地二级索引限制为 10GB 数据的限制,之后您开始出现错误.

The docs also talk about limits with a local secondary index being limited to 10GB of data after which you start getting errors.

任何项目集合的最大大小为 10 GB.此限制不适用于没有本地二级索引的表;只有具有一个或多个本地二级索引的表才会受到影响."

"The maximum size of any item collection is 10 GB. This limit does not apply to tables without local secondary indexes; only tables that have one or more local secondary indexes are affected."

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html#LSI.ItemCollections

我能理解.如果单个分区键超过 10GB,它是否还有其他的魔法来分区数据.还是它只是继续扩大该分区?这对您的关键设计有何影响?

That I can understand. So does it have some other magic for partitioning the data for a single partition key if it exceeds 10GB. Or does it just keep growing that partition? And what are the implications of that for your key design?

这个问题的背景是,我见过很多在多租户环境中使用诸如 TenantId 作为分区键的例子.但是,如果特定的 TenantId 可以拥有超过 10 GB 的数据,这似乎会受到限制.

The background to the question is that I've seen lots of examples of using something like a TenantId as a partition key in a multi-tentant environment. But that seems limiting if a specific TenantId could have more than 10 GB of data.

我一定是错过了什么?

推荐答案

TL;DR - 通过将范围键值包含到分区中,即使它们具有相同的分区键值也可以拆分项目功能.

TL;DR - items can be split even if they have the same partition key value by including the range key value into the partitioning function.

长版:

这是一个很好的问题,文档中解决了它 这里这里.如文档所述,DynamoDB 表中的项目根据其分区键值(过去称为 散列键)分区为一个或多个分区,使用 ah灰化函数.分区的数量是根据最大期望总吞吐量以及键空间中项目的分布得出的.换句话说,如果选择分区键以使其在分区键空间中均匀分布项目,则每个分区最终都会具有大致相同数量的项目.每个分区中的项目数大约等于表中的项目总数除以分区数.

This is a very good question, and it is addressed in the documentation here and here. As the documentation states, items in a DynamoDB table are partitioned based on their partition key value (which used to be called hash key) into one or multiple partitions, using a hashing function. The number of partitions is derived based on the maximum desired total throughput, as well as the distribution of items in the key space. In other words, if the partition key is chosen such that it distributes items uniformly across the partition key space, the partitions end up having approximately the same number of items each. This number of items in each partition is approximately equal to the total number of items in the table divided by the number of partitions.

文档还指出,每个分区的空间限制为大约 10GB.并且一旦存储在任何分区中的所有项目的大小总和超过 10GB,DynamoDB 将启动一个后台进程,该进程将自动透明地将此类分区分成两半——从而产生两个新分区.再一次,如果项目均匀分布,这很好,因为每个新的子分区最终将保存原始分区中大约一半的项目.

The documentation also states that each partition is limited to about 10GB of space. And that once the sum of the sizes of all items stored in any partition grows beyond 10GB, DynamoDB will start a background process that will automatically and transparently split such partitions in half - resulting in two new partitions. Once again, if the items are distributed uniformly, this is great because each new sub-partition will end up holding roughly half the items in the original partition.

拆分的一个重要方面是拆分分区的吞吐量将是原始分区可用吞吐量的一半.

An important aspect to splitting is that the throughput of the split-partitions will each be half of the throughput that would have been available for the original partition.

到目前为止,我们已经介绍了快乐的情况.

So far we've covered the happy case.

另一方面,可能有一个或几个分区键值对应于大量项目.如果表架构使用排序键并且多个项目散列到相同的分区键,则通常会发生这种情况.在这种情况下,单个分区键可能会负责占用超过 10 GB 的项目.这将导致分裂.在这种情况下,DynamoDB 仍将创建两个新分区,但它不会仅使用分区键来决定应将项目存储在哪个子分区中,它还将使用排序键.

On the flip side it is possible to have one, or a few, partition key values that correspond to a very large number of items. This can usually happen if the table schema uses a sort key and several items hash to the same partition key. In such case, it is possible that a single partition key could be responsible for items that together take up more than 10 GB. And this will result in a split. In this case DynamoDB will still create two new partitions but instead of using only the partition key to decide which sub-partition should an item be stored in, it will also use the sort key.

示例

不失一般性并让事情更容易推理,假设有一个表,其中分区键是字母 (A-Z),数字用作排序键.

Without loss of generality and to make things easier to reason about, imagine that there is a table where partition keys are letters (A-Z), and numbers are used as sort keys.

假设该表有大约 9 个分区,因此字母 A、B、C 将存储在分区 1 中,字母 D、E、F 将存储在分区 2 中,等等.

Imaging that the table has about 9 partitions, so letters A,B,C would be stored in partition 1, letters D,E,F would be in partition 2, etc.

在下图中,分区边界标记为 h(A0)h(D0) 等,以表明,例如,存储在第一个分区是分区键散列到 h(A0)h(D0) 之间的值的项目 - 0 是故意的,并且接下来就派上用场了.

In the diagram below, the partition boundaries are marked h(A0), h(D0) etc. to show that, for instance, the items stored in the first partition are the items who's partition key hashes to a value between h(A0) and h(D0) - the 0 is intentional, and comes in handy next.

[ h(A0) ]--------[ h(D0) ]---------[ h(G0) ]-------[ h(J0) ]-------[ h(M0) ]- ..
  |   A    B    C   |       E    F   |   G      I    |   J    K   L  |
  |   1    1    1   |       1    1   |   1      1    |   1    1   1  |
  |   2    2    2   |       2    2   |          2    |        2      |
  |   3         3   |            3   |          3    |               |
  ..                ..               ..              ..              ..
  |            100  |           500  |               |               |
  +-----------------+----------------+---------------+---------------+-- ..

请注意,对于大多数分区键值,表中有 1 到 3 个项目,但有两个分区键值:DF看起来太好了.D 有 100 项,而 F 有 500 项.

Notice that for most partition key values, there are between 1 and 3 items in the table, but there are two partition key values: D and F that are not looking too good. D has 100 items while F has 500 items.

如果分区键值为 F 的项目不断添加,最终分区 [h(D0)-h(G0)) 将分裂.为了能够拆分具有相同哈希键的项目,必须使用范围键值,因此我们最终会遇到以下情况:

If items with a partition key value of F keep getting added, eventually the partition [h(D0)-h(G0)) will split. To make it possible to split the items that have the same hash key, the range key values will have to be used, so we'll end up with the following situation:

..[ h(D0) ]------------/ [ h(F500) ] / ----------[ h(G0) ]- ..
      |       E       F       |           F         |
      |       1       1       |          501        |
      |       2       2       |          502        |
      |               3       |          503        |
      ..                      ..                    ..
      |              500      |         1000        |
.. ---+-----------------------+---------------------+--- ..

原来的分区[h(D0)-h(G0))被拆分成[h(D0)-h(F500))[h(F500)-h(G0))

我希望这有助于可视化项目通常根据通过将散列函数应用于其分区键值而获得的散列值映射到分区,但如果需要,被散列的值可以包括分区键 + 排序键值也是.

I hope this helps to visualize that items are generally mapped to partitions based on a hash value obtained by applying a hashing function to their partition key value, but if need be, the value being hashed can include the partition key + a sort key value as well.

这篇关于单个分区键值的 DynamoDB 最大分区大小是否为 10GB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆