dynamodb表中的哈希范围有什么用? [英] What is the use of a Hash range in a dynamodb table?

查看:90
本文介绍了dynamodb表中的哈希范围有什么用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是dynamodb(ddb)的新手。我正在浏览其文档,并说要添加哈希键和哈希范围键。在文档中,它说ddb将在哈希键上创建已排序的索引,并在哈希范围上创建已排序的索引。

I am new to dynamodb (ddb). I was going through its documentation and it says to add Hash Key and a Hash Range key. In the documentation it says that ddb will create an usorted index on the hash key and a sorted index on the hash range.

拥有这两个键而不只是一个键的目的是什么。是因为使用了第一个键,例如:
包含以下内容的HashTable:
键-哈希范围内每个值的键范围

What is the purpose of having these 2 keys rather than just one key. Is it because the first key is used like : A HashTable which contains : key - range of keys for each value in the hash range

第二个HashTable
哈希范围键-实际数据值。

2nd HashTable hash range key - Actual data value.

这将有助于隔离数据并快速查找。但是为什么只有2个级别的HashMaps,我可以对n层进行此操作并获得更快的查找。

This would help segregate data and make lookup fast. But then why only 2 levels of HashMaps, I could do this for n number of layers and get the faster lookups.

谢谢。

推荐答案

问:拥有这两个键而不只是一个键的目的是什么?

根据数据模型,哈希键可让您从表中唯一标识一条记录,而范围键可以是(可选)用于对通常一起检索的几个记录进行分组和排序。示例:如果您要定义用于存储订单商品的汇总,那么OrderId可以是您的哈希键,而OrderItemId可以是您的范围键。您可以在下面找到使用这两个键的正式定义:

In terms of the Data Model, the Hash Key allows you to uniquely identify a record from your table, and the Range Key can be optionally used to group and sort several records that are usually retrieved together. Example: If you are defining an Aggregate to store Order Items, the OrderId could be your Hash Key, and the OrderItemId the Range Key. You can find below a formal definition for the use of these two keys:


带有范围键的复合哈希键允许开发人员创建一个
主键是两个属性的组合, hash
属性和 range属性。当查询复合
键时,hash属性需要唯一匹配,但可以为range属性指定一个范围
操作:例如,过去24小时内来自Werner的所有订单
,或过去24小时内单个
玩家玩的所有游戏。 [VOGELS]

因此范围键数据模型添加了分组功能,但是,这两个键的使用对存储模型也有影响:

So the Range Key adds a grouping capability to the Data Model, however, the use of these two keys also have an implication on the Storage Model:


Dynamo使用一致的散列法在其整个存储空间上划分其键空间
复制副本并确保统一的负载分布。假设
的密钥访问分布没有高度倾斜,统一的
分布可以帮助我们实现统一的负载分布。
[DDB-SOSP2007]

不仅哈希键可以唯一地标识记录,而且还可以确保负载分配。 Range Key (使用时)有助于指示大部分将一起检索的记录,因此,存储也可以针对此类需求进行优化。

Not only the Hash Key allows to uniquely identify the record, but also is the mechanism to ensure load distribution. The Range Key (when used) helps to indicate the records that will be mostly retrieved together, therefore, the storage can also be optimized for such need.

问:但是,为什么只有2个级别的HashMap?我可以对n层进行此操作并获得更快的查找。

具有多层查找将增加指数级的复杂性,以在集群环境中有效运行数据库,这是大多数NOSQL数据库最重要的用例之一。该数据库必须具有高可用性,防故障,有效可伸缩性,并且仍必须在分布式环境中运行。

Having many layers of lookups will add exponential complexity to effectively run the database in a cluster environment , which is one of the most essential use cases for the majority of NOSQL databases. The database has to be highly available, failure-proof, effectively scalable, and still perform in a distributed environment.


Dynamo的主要设计要求之一是必须逐步扩展
。这需要一种动态划分Dynamo的机制。
数据在系统中的节点集(即存储主机)上。
Dynamo的分区方案依靠一致的散列来实现
跨多个存储主机分配负载。 [DDB-SOSP2007]

"One of the key design requirements for Dynamo is that it must scale incrementally. This requires a mechanism to dynamically partition the data over the set of nodes (i.e., storage hosts) in the system. Dynamo’s partitioning scheme relies on consistent hashing to distribute the load across multiple storage hosts."[DDB-SOSP2007]

这始终是一个权衡,您在NOSQL数据库中看到的每个限制很可能是由存储模型要求引入的。尽管关系数据库在数据建模方面非常灵活,但是它们在分布式环境中运行时仍有一些局限性。

It is always a trade off, every single limitation that you see in NOSQL databases are most likely introduced by the storage model requirements. Although Relational Databases are very flexible in terms of data modeling they have several limitations when it comes to run in a distributed environment.

选择正确的密钥来表示数据是其中之一。设计过程中最关键的方面,它直接影响应用程序的性能,规模和成本。

Choosing the correct keys to represent your data is one of the most critical aspects during your design process, and it directly impacts how much your application will perform, scale and cost.

脚注:


  • 数据模型是我们感知和操纵数据的模型。它描述了我们如何与数据库中的数据交互[FOWLER]。换句话说,这就是抽象数据模型,对实体进行分组的方式,选择作为主键的属性等的方式

  • The Data Model is the model through which we perceive and manipulate our data. It describes how we interact with the data in the database [FOWLER]. In other words, it is how you abstract your data model, the way you group your entities, the attributes that you choose as primary keys, etc

存储模型描述了数据库如何在内部存储和操作数据[FOWLER]。尽管您不能直接控制它,但是您可以通过了解数据库内部的工作方式来优化如何检索或写入数据。

The Storage Model describes how the database stores and manipulates the data internally [FOWLER]. Although you cannot control this directly, you can certainly optimize how the data is retrieved or written by knowing how the database works internally.

这篇关于dynamodb表中的哈希范围有什么用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆