Azure的表分区策略 [英] Azure Table Partitioning Strategy

查看:319
本文介绍了Azure的表分区策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拿出基于一个DateTime分区键的策略,不会导致经常在最佳实践指南中规定的追加只写瓶颈。

I am trying to come up with a partition key strategy based on a DateTime that doesn't result in the Append-Only write bottleneck often described in best practices guidelines.

基本上,如果你通过类似YYYY-MM-DD分区,某一天你的所有的写操作将结束在同一个分区,这会降低写入性能。

Basically, if you partition by something like YYYY-MM-DD, all your writes for a particular day will end up the same partition, which will reduce write performance.

在理想情况下,分区键应该甚至整个分配尽可能多的分区写越好。

Ideally, a partition key should even distribute writes across as many partitions as possible.

要做到这一点,同时还立足的关键掀起了日期时间价值,我需要拿出一个办法来分配数额是多少,日界线值,其中桶的数量是每个时间间隔predetermined数桶 - 说50天。一个界线到水桶的分配应该是尽可能的随机 - 但总是相同给定值。这样做的原因是,我需要能够始终获得正确的分区给出的原始日期时间值。换言之,这是如哈希

To accomplish this while still basing the key off a DateTime value, I need to come up with a way to assign what amounts to buckets of dateline values, where the number of buckets is predetermined number per time interval - say 50 a day. The assignment of a dateline to a bucket should be as random as possible - but always the same for a given value. The reason for this is that I need to be able to always get the correct partition given the original DateTime value. In other words, this is like a hash.

最后,和批判,我需要分区键是连续的一些总体水平。因此,尽管对于一个给定的时间间隔DateTime值,比如说1天,将随机整个X分区键分布,当天所有的分区键将是一个可查询的范围之间。这将让我查询所有行我总间隔,然后由DateTime值排序,他们得到正确的顺序。

Lastly, and critically, I need the partition key to be sequential at some aggregate level. So while DateTime values for a given interval, say 1 day, would be randomly distributed across X partition keys, all the partition keys for that day would be between a queryable range. This would allow me to query all rows for my aggregate interval and then sort them by the DateTime value to get the correct order.

的思考?这必须是已经被解决了一个相当著名的问题。

Thoughts? This must be a fairly well known problem that has been solved already.

推荐答案

如何使用日期时间戳的毫秒组成部分,MOD 50.这将使你在一天的随机分布,本身的价值将是连续的,你可以很容易地计算出PartitionKey在未来给原来的时间戳?

How about using the millisecond component of the date time stamp, mod 50. That would give you your random distribution throughout the day, the value itself would be sequential, and you could easily calculate the PartitionKey in future given the original timestamp ?

这篇关于Azure的表分区策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆