DynamoDB 分区键如何工作? [英] How does the DynamoDB partition key work?

查看:21
本文介绍了DynamoDB 分区键如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解如何为 DynamoDB 表创建分区.

据此博客,具有相同分区键的所有项目都存储在一起",所以如果我有一个用户 ID 从 1 到 1000 的表,这是否意味着我将有 1000 个分区?或者这取决于内部哈希函数",但是我们怎么知道会有多少个分区呢?

它后来建议使用 1-10 的随机后缀来平均分配每个分区的数据,但它怎么知道它会查询给定发票编号 10 次?只有当你有 10 个分区时?但在这种情况下,您可能有数千个发票编号,这意味着将创建相同数量的分区,并进行查询以查询发票编号

解决方案

创建 Amazon DynamoDB 表时,您可以以每秒读取数和每秒写入数指定所需的吞吐量.然后将在足以提供请求的吞吐量的多个服务器(分区)中配置该表.

无法查看创建的分区数量——它完全由 DynamoDB 管理.随着数据量的增加或预置吞吐量的增加,将创建额外的分区.

假设您已请求每秒 1000 次读取,并且数据已在 10 台服务器(10 个分区)内部进行了分区.每个分区将提供每秒 100 次读取.如果所有读取请求都针对同一个分区键,则吞吐量将限制为每秒 100 次读取.如果请求分布在不同的值范围内,则吞吐量可以是完整的每秒 1000 次读取.

如果对同一个分区键进行多次查询,可能会导致热分区,从而限制总可用吞吐量.

把它想象成一个在柜员窗口前排着长队的银行.如果每个人都在一个柜员面前排队,那么可以为更少的顾客提供服务.将客户分布在许多不同的柜员窗口中会更有效.用于分配客户的良好的分区键可能是客户编号,因为每个客户的编号都不相同.分区键差可能是他们的邮政编码,因为他们都住在银行附近的同一区域.

简单的规则是您应该选择具有不同值范围的分区键.

请参阅:分区和数据分布p>

I'm trying to understand how the partition created for DynamoDB tables.

According to this blog, "All items with the same partition key are stored together", so if I have a table with user id from 1 to 1000, does that mean I will have 1000 partition? Or it's up to the "internal hash function", but how do we know how many partitions there will be?

It later suggested using random suffix from 1-10 to evenly distribute data for each partition, but how does it know it will query 10 times for a given invoice number? Is that only when you have 10 partitions? but in this case you could have thousands of invoice numbers, that means the same amount of partitions will be created, and query made to query an invoice number

解决方案

When an Amazon DynamoDB table is created, you can specify the desired throughput in Reads per second and Writes per second. The table will then be provisioned across multiple servers (partitions) sufficient to provide the requested throughput.

You do not have visibility into the number of partitions created -- it is fully managed by DynamoDB. Additional partitions will be created as the quantity of data increases or when the provisioned throughput is increased.

Let's say you have requested 1000 Reads per second and the data has been internally partitioned across 10 servers (10 partitions). Each partition will provide 100 Reads per second. If all Read requests are for the same partition key, the throughput will be limited to 100 Reads per second. If the requests are spread over a range of different values, the throughput can be the full 1000 Reads per second.

If many queries are made for the same Partition Key, it can result in a Hot Partition that limits the total available throughput.

Think of it like a bank with lines in front of teller windows. If everybody lines up at one teller, less customers can be served. It is more efficient to distribute customers across many different teller windows. A good partition key for distributing customers might be the customer number, since it is different for each customer. A poor partition key might their zip code because they all live in the same area nearby the bank.

The simple rule is that you should choose a Partition Key that has a range of different values.

See: Partitions and Data Distribution

这篇关于DynamoDB 分区键如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆