对于单个分区键值,DynamoDB最大分区大小是否为10GB? [英] Is there a DynamoDB max partition size of 10GB for a single partition key value?

查看:95
本文介绍了对于单个分区键值,DynamoDB最大分区大小是否为10GB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了许多有关设计分区键和排序键的DynamoDB文档,但是我认为我必须缺少一些基本知识。

I've read lots of DynamoDB docs on designing partition keys and sort keys, but I think I must be missing something fundamental.

如果分区不正确密钥设计,当单个分区密钥值的数据超过10GB时会发生什么?

If you have a bad partition key design, what happens when the data for a SINGLE partition key value exceeds 10GB?

了解分区行为部分指出:

The 'Understand Partition Behaviour' section states:

单个分区可以容纳大约10 GB的数据

"A single partition can hold approximately 10 GB of data"

如何对单个分区键进行分区?

How can it partition a single partition key?

http:// docs。 aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions

文档还讨论了使用本地二级索引的限制限制为10GB的数据,之后您将开始出错。

The docs also talk about limits with a local secondary index being limited to 10GB of data after which you start getting errors.

任何项目集合的最大大小为10 GB。此限制不适用层叠到没有本地二级索引的表;只有具有一个或多个本地二级索引的表才会受到影响。

"The maximum size of any item collection is 10 GB. This limit does not apply to tables without local secondary indexes; only tables that have one or more local secondary indexes are affected."

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html#LSI.ItemCollections

我可以理解,如果单个分区键的数据超过10GB,它是否还有其他一些魔术来分区数据呢?或者只是继续扩大该分区?这意味着什么?

That I can understand. So does it have some other magic for partitioning the data for a single partition key if it exceeds 10GB. Or does it just keep growing that partition? And what are the implications of that for your key design?

问题的背景是,我已经看到了很多示例,这些示例在多用户环境中使用TenantId作为分区键。帐篷环境。但这似乎限制了一个特定的TenantId是否可以拥有超过10 GB的数据。

The background to the question is that I've seen lots of examples of using something like a TenantId as a partition key in a multi-tentant environment. But that seems limiting if a specific TenantId could have more than 10 GB of data.

我一定会丢失一些东西吗?

I must be missing something?

推荐答案

TL; DR -即使项目具有相同的分区键值,也可以拆分它们,方法是:

TL;DR - items can be split even if they have the same partition key value by including the range key value into the partitioning function.

长版本:

这是一个非常好的问题,在文档此处此处。如文档所述,DynamoDB表中的项目根据它们的分区键值(以前称为哈希键)使用ah 格式化功能。分区数是根据最大期望的总吞吐量以及密钥空间中项目的分布得出的。换句话说,如果选择分区键以使其在分区键空间上均匀地分配项目,则每个分区最终将具有大约相同数量的项目。每个分区中的项目数大约等于表中项目的总数除以分区数。

This is a very good question, and it is addressed in the documentation here and here. As the documentation states, items in a DynamoDB table are partitioned based on their partition key value (which used to be called hash key) into one or multiple partitions, using a hashing function. The number of partitions is derived based on the maximum desired total throughput, as well as the distribution of items in the key space. In other words, if the partition key is chosen such that it distributes items uniformly across the partition key space, the partitions end up having approximately the same number of items each. This number of items in each partition is approximately equal to the total number of items in the table divided by the number of partitions.

文档还指出,每个分区都受到限制约10GB的空间。并且,一旦存储在任何分区中的所有项目的大小总和超过10GB,DynamoDB将启动后台进程,该进程将自动透明地将此类分区一分为二-产生两个新分区。再次,如果项目是均匀分布的,那么这很好,因为每个新的子分区最终将在原始分区中容纳大约一半的项目。

The documentation also states that each partition is limited to about 10GB of space. And that once the sum of the sizes of all items stored in any partition grows beyond 10GB, DynamoDB will start a background process that will automatically and transparently split such partitions in half - resulting in two new partitions. Once again, if the items are distributed uniformly, this is great because each new sub-partition will end up holding roughly half the items in the original partition.

重要的方面拆分的原因是,拆分分区的吞吐量分别是原始分区可用吞吐量的一半。

An important aspect to splitting is that the throughput of the split-partitions will each be half of the throughput that would have been available for the original partition.

到目前为止,我们已经介绍了

So far we've covered the happy case.

另一方面,可能有一个或几个分区键值对应于大量项目。如果表模式使用排序键,并且有多个项目哈希到同一分区键,则通常会发生这种情况。在这种情况下,单个分区键可能会负责合计占用10 GB以上的项。这将导致分裂。在这种情况下,DynamoDB仍将创建两个新分区,但它不仅将使用分区键来决定将项目存储在哪个子分区中,还将使用排序键。

On the flip side it is possible to have one, or a few, partition key values that correspond to a very large number of items. This can usually happen if the table schema uses a sort key and several items hash to the same partition key. In such case, it is possible that a single partition key could be responsible for items that together take up more than 10 GB. And this will result in a split. In this case DynamoDB will still create two new partitions but instead of using only the partition key to decide which sub-partition should an item be stored in, it will also use the sort key.

示例

在不失一般性的前提下,为了使事情更容易推论,想象一下有一个分区键所在的表

Without loss of generality and to make things easier to reason about, imagine that there is a table where partition keys are letters (A-Z), and numbers are used as sort keys.

想象表有大约9个分区,因此字母A,B,C将存储在分区1中,字母D,E,F将在分区2中,等等。

Imaging that the table has about 9 partitions, so letters A,B,C would be stored in partition 1, letters D,E,F would be in partition 2, etc.

在下图中,分区边界标记为 h(A0) h(D0)等表示,例如,存储在第一个分区中的项目是将分区键散列到 h(A0) h(D0)之间的值- 0 是有意的,接下来派上用场。

In the diagram below, the partition boundaries are marked h(A0), h(D0) etc. to show that, for instance, the items stored in the first partition are the items who's partition key hashes to a value between h(A0) and h(D0) - the 0 is intentional, and comes in handy next.

[ h(A0) ]--------[ h(D0) ]---------[ h(G0) ]-------[ h(J0) ]-------[ h(M0) ]- ..
  |   A    B    C   |       E    F   |   G      I    |   J    K   L  |
  |   1    1    1   |       1    1   |   1      1    |   1    1   1  |
  |   2    2    2   |       2    2   |          2    |        2      |
  |   3         3   |            3   |          3    |               |
  ..                ..               ..              ..              ..
  |            100  |           500  |               |               |
  +-----------------+----------------+---------------+---------------+-- ..

请注意,对于大多数分区键值,表中有1到3个项目,但是有两个分区键值: D F 看起来不太好。 D 有100个项目,而 F 有500个项目。

Notice that for most partition key values, there are between 1 and 3 items in the table, but there are two partition key values: D and F that are not looking too good. D has 100 items while F has 500 items.

如果分区键值为 F 的项不断添加,最终分区 [h(D0)-h(G0))将拆分。为了能够拆分具有相同哈希键的项目,必须使用范围键值,因此我们将遇到以下情况:

If items with a partition key value of F keep getting added, eventually the partition [h(D0)-h(G0)) will split. To make it possible to split the items that have the same hash key, the range key values will have to be used, so we'll end up with the following situation:

..[ h(D0) ]------------/ [ h(F500) ] / ----------[ h(G0) ]- ..
      |       E       F       |           F         |
      |       1       1       |          501        |
      |       2       2       |          502        |
      |               3       |          503        |
      ..                      ..                    ..
      |              500      |         1000        |
.. ---+-----------------------+---------------------+--- ..

原始分区 [ h(D0)-h(G0))分为 [h(D0)-h(F500)) [h(F500)-h(G0))

我希望这有助于可视化项目通常基于a映射到分区通过将哈希函数应用于其分区键值而获得的哈希值,但是如果需要,被哈希的值也可以包括分区键+排序键值。

I hope this helps to visualize that items are generally mapped to partitions based on a hash value obtained by applying a hashing function to their partition key value, but if need be, the value being hashed can include the partition key + a sort key value as well.

这篇关于对于单个分区键值,DynamoDB最大分区大小是否为10GB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆