Azure表存储分区键 [英] Azure Table Storage Partition Key

查看:95
本文介绍了Azure表存储分区键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

两个相关的问题.

1)无论如何,有没有获取表实体所驻留的服务器的ID? 2)使用GUID是否可以为我分配最佳的分区键?如果没有,将会怎样?

1) Is there anyway to get an ID of the server a table entity lives on? 2) Will using a GUID give me the best partition key distribution possible? If not, what will?

我们在表存储性能上苦苦挣扎了数周.简而言之,这确实很糟糕,但是从一开始我们就意识到使用随机分区键可以将实体分布到许多服务器上,这正是我们想要实现的每秒8000次读取时我们想要做的.显然我们的分区键不够随机,因此出于测试目的,我决定仅使用GUID.第一印象是速度更快.

we have been struggling for weeks on table storage performance. In short, it's really bad, but early on we realized that using a randomish partition key will distribute the entities across many servers, which is exactly what we want to do as we are trying to achieve 8000 reads per second. Apparently our partition key wasn't random enough, so for testing purposes, I have decided to just use a GUID. First impression is it is waaaaaay faster.

真正糟糕的获得性能是<每秒1000.分区键为Guid.NewGuid(),行键为常量"UserInfo". Get是使用TableOperation和pk和rk执行的,其他操作如下:TableOperation resolveOperation = TableOperation.Retrieve(pk,rk);返回cloudTable.ExecuteAsync(retrieveOperation).我们始终使用索引读取,而不使用表扫描.而且,VM大小是中或大,从不小于任何大小.并行否,异步是

Really bad get performance is < 1000 per second. Partition key is Guid.NewGuid() and row key is the constant "UserInfo". Get is execute using TableOperation with pk and rk, nothing else as follows: TableOperation retrieveOperation = TableOperation.Retrieve(pk, rk); return cloudTable.ExecuteAsync(retrieveOperation). We always use indexed reads and never table scans. Also, VM size is medium or large, never anything smaller. Parallel no, async yes

推荐答案

正如其他用户指出的那样,Azure表受运行时严格控制,因此您无法控制/检查哪些特定的存储节点正在处理您的请求.此外,任何给定的分区都由单个服务器提供服务,也就是说,属于同一分区的实体不能在多个存储节点之间拆分(请参阅

As other users have pointed out, Azure Tables are strictly controlled by the runtime and thus you cannot control / check which specific storage nodes are handling your requests. Furthermore, any given partition is served by a single server, that is, entities belonging to the same partition cannot be split between several storage nodes (see HERE)

在Windows Azure表中,PartitionKey属性用作分区键.具有相同PartitionKey值的所有实体都聚集在一起,并从单个服务器节点提供服务.这样,用户可以通过设置PartitionKey值来控制实体位置,并在同一分区中的实体上执行实体组事务.

您提到您的目标是每秒8000个请求吗?如果真是这样,您可能已经达到要求非常好的表/分区键设计的阈值.请参阅文章"

You mention that you are targeting 8000 requests per second? If that is the case, you might be hitting a threshold that requires very good table/partitionkey design. Please see the article "Windows Azure Storage Abstractions and their Scalability Targets"

以下摘录适用于您的情况:

The following extract is applicable to your situation:

这将为2012年6月7日之后创建的单个存储帐户提供以下可扩展性目标.

  • 容量–高达200 TB
  • 交易-每秒高达20,000个实体/消息/斑点
  • Capacity – Up to 200 TBs
  • Transactions – Up to 20,000 entities/messages/blobs per second

正如其他用户指出的那样,如果您的PartitionKey编号遵循增量模式,则Azure运行时将识别出这种情况并将某些分区分组在同一存储节点中.

As other users pointed out, if your PartitionKey numbering follows an incremental pattern, the Azure runtime will recognize this and group some of your partitions within the same storage node.

此外,如果我正确理解了您的问题,那么您当前正在通过GUID分配分区键吗?在这种情况下,这意味着表中的每个PartitionKey都是唯一的,因此每个partitionkey最多只能有1个实体.根据上面的文章,Azure表扩展的方式是通过在独立存储节点内的分区键中对实体进行分组.如果您的分区键是唯一的,因此最多包含一个实体,这意味着Azure表一次只能扩展一个实体!现在,我们知道Azure并不是那么愚蠢,它会在检测到创建方式的模式时将分区键分组.因此,如果您在Azure中遇到此触发器,并且Azure正在对您的分区键进行分组,则意味着您的可伸缩性功能仅限于此分组算法的智能性.

Furthermore, if I understood your question correctly, you are currently assigning partition keys via GUID's? If that is the case, this means that every PartitionKey in your table will be unique, thus every partitionkey will have no more than 1 entity. As per the articles above, the way Azure table scales out is by grouping entities in their partition keys inside independent storage nodes. If your partitionkeys are unique and thus contain no more than one entity, this means that Azure table will scale out only one entity at a time! Now, we know Azure is not that dumb, and it groups partitionkeys when it detects a pattern in the way they are created. So if you are hitting this trigger in Azure and Azure is grouping your partitionkeys, it means your scalability capabilities are limited to the smartness of this grouping algorithm.

根据上述2012年的可扩展性目标,每个分区键应能够每秒为您提供2,000个事务.从理论上讲,在这种情况下,您最多需要四个分区键(假设这四个键之间的工作负荷是平均分配的).

As per the the scalability targets above for 2012, each partitionkey should be able to give you 2,000 transactions per second. Theoretically, you should need no more than 4 partition keys in this case (assuming that the workload between the four is distributed equally).

我建议您将分区键设计为将实体分组的方式,使每个分区每秒不超过2,000个实体,并使用GUID作为分区键来删除.这将使您能够更好地支持诸如实体事务组之类的功能,降低表设计的复杂性,并获得所需的性能.

I would suggest you to design your partition keys to group entities in such a way that no more than 2,000 entities per second per partition are reached, and drop using GUID's as partitionkeys. This will allow you to better support features such as Entity Transaction Group, reduce the complexity of your table design, and get the performance you are looking for.

这篇关于Azure表存储分区键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆