DynamoDB分区键如何工作? [英] How does the DynamoDB partition key work?

查看:181
本文介绍了DynamoDB分区键如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解如何为DynamoDB表创建分区。

I'm trying to understand how the partition created for DynamoDB tables.

根据此 blog ,具有相同分区键的所有项目都存储在一起,所以如果我有一个带有用户ID从1到1000,是否意味着我会有1000个分区?还是取决于内部哈希函数,但是我们如何知道会有多少个分区?

According to this blog, "All items with the same partition key are stored together", so if I have a table with user id from 1 to 1000, does that mean I will have 1000 partition? Or it's up to the "internal hash function", but how do we know how many partitions there will be?

后来建议使用1-10之间的随机后缀来平均分配每个分区的数据,但是如何知道它将为给定的发票号查询10次呢?仅当您有10个分区时才这样吗?但是在这种情况下,您可能有数千个发票编号,这意味着将创建相同数量的分区,并进行查询以查询发票编号

It later suggested using random suffix from 1-10 to evenly distribute data for each partition, but how does it know it will query 10 times for a given invoice number? Is that only when you have 10 partitions? but in this case you could have thousands of invoice numbers, that means the same amount of partitions will be created, and query made to query an invoice number

推荐答案

创建Amazon DynamoDB表时,您可以以每秒读取数和每秒写入数指定所需的吞吐量。然后,该表将在多个服务器(分区)之间进行配置,足以提供请求的吞吐量。

When an Amazon DynamoDB table is created, you can specify the desired throughput in Reads per second and Writes per second. The table will then be provisioned across multiple servers (partitions) sufficient to provide the requested throughput.

没有可见性创建的分区-由DynamoDB完全管理。随着数据量的增加或预配置吞吐量的增加,将创建其他分区。

You do not have visibility into the number of partitions created -- it is fully managed by DynamoDB. Additional partitions will be created as the quantity of data increases or when the provisioned throughput is increased.

假设您每秒请求1000次读取,并且数据已在内部进行分区跨10台服务器(10个分区)。每个分区将提供每秒100次读取。如果所有读取请求都针对相同的分区键,则吞吐量将限制为每秒100次读取。如果请求分散在不同的值范围内,则吞吐量可以是完整的每秒1000次读取

Let's say you have requested 1000 Reads per second and the data has been internally partitioned across 10 servers (10 partitions). Each partition will provide 100 Reads per second. If all Read requests are for the same partition key, the throughput will be limited to 100 Reads per second. If the requests are spread over a range of different values, the throughput can be the full 1000 Reads per second.

如果进行了很多查询对于相同的分区密钥,可能会导致热分区,从而限制了总的可用吞吐量。

If many queries are made for the same Partition Key, it can result in a Hot Partition that limits the total available throughput.

在柜员窗口前排成一行。如果每个人都在一个柜员旁边排队,则可以为更少的客户提供服务。在许多不同的出纳窗口中分布客户更为有效。用于分配客户的良好分区键可能是客户编号,因为每个客户的编号都不相同。 可怜的分区键可能是邮政编码,因为它们都住在银行附近的同一地区。

Think of it like a bank with lines in front of teller windows. If everybody lines up at one teller, less customers can be served. It is more efficient to distribute customers across many different teller windows. A good partition key for distributing customers might be the customer number, since it is different for each customer. A poor partition key might their zip code because they all live in the same area nearby the bank.

简单的规则是,您应该选择

The simple rule is that you should choose a Partition Key that has a range of different values.

请参阅:分区和数据分发

这篇关于DynamoDB分区键如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆