如何对用于存储日志的Azure表进行分区 [英] How to partition Azure tables used for storing logs

查看：93 发布时间：2020/5/3 8:06:18 azure logging partitioning azure-table-storage

本文介绍了如何对用于存储日志的Azure表进行分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们最近更新了日志记录以使用Azure表存储，这是因为按行和分区查询时它的低成本和高性能非常适合此目的.

We have recently updated our logging to use Azure table storage, which owing to its low cost and high performance when querying by row and partition is highly suited to this purpose.

我们正在尝试遵循文档设计可伸缩性Azure表存储的分区策略.当我们在此表中进行大量插入操作时(随着比例的增加，希望插入的数量也在增加)，我们需要确保不会超出极限，从而导致日志丢失.我们的设计结构如下:

We are trying to follow the guidelines given in the document Designing a Scalable Partitioning Strategy for Azure Table Storage. As we are making a great number of inserts to this table (and hopefully an increasing number, as we scale) we need to ensure that we don't hit our limits resulting in logs being lost. We structured our design as follows:

我们在每个环境(DEV，TEST，PROD)中都有一个Azure存储帐户.

We have a Azure storage account per environment (DEV, TEST, PROD).

我们每种产品都有一张桌子.

We have a table per product.

我们使用TicksReversed + GUID作为行键，以便我们可以在某些时间查询结果块较高表现.

We are using a TicksReversed+GUID for the Row Key, so that we can query blocks of results between certain times with a high performance.

我们最初选择通过Logger对表进行分区，这对我们来说是产品的广泛领域，例如API，应用程序，性能和缓存.但是，由于分区数量少，我们担心这导致所谓的热"分区，其中许多在给定的时间段内对一个分区执行插入操作.所以我们更改为在Context上进行分区(对于我们来说，是类名或API 资源).

We originally chose to partition the table by Logger, which for us were broad areas of the product such as API, Application, Performance and Caching. However, due to the low numbers of partitions we were concerned that this resulted in so-called "hot" partitions where many inserts were performed on one partition in a given time period. So we changed to partition on Context (for us, the class name or API resource).

但是，实际上我们发现这并不理想，因为当我们一目了然地查看日志时，我们希望它们按时间顺序显示.相反，我们最终将结果块按上下文分组，如果要按时间对它们进行排序，就必须获取所有分区.

However, in practice we have found this is less than ideal, because when we look at our logs at a glance we would like them to appear in order of time. We instead end up with blocks of results grouped by context, and we would have to get all partitions if we want to order them by time.

我们曾经有过一些想法

使用时间块(例如1小时)作为分区键，以按时间对它们进行排序(导致热分区1小时)

use blocks of time (say 1 hour) for partition keys to order them by time (results in hot partitions for 1 hour)

使用一些随机的GUID作为分区键来尝试分发日志(我们失去了快速查询诸如Context之类的功能的能力).

use a few random GUIDs for partition keys to try to distribute the logs (we lose the ability to query quickly on features such as Context).

由于这是Azure表存储的常见应用程序，因此必须有某种标准过程. 对用于存储日志的Azure表进行分区的最佳实践是什么?

As this is such a common application of Azure table storage, there must be some sort of standard procedure. What is the best practice for partitioning Azure tables that are used for storing logs?

使用便宜的Azure存储(表存储似乎是显而易见的选择)

Use cheap Azure storage (Table Storage seems the obvious choice)

快速，可扩展的写入

丢失日志的机会很小(例如，通过超过Azure表存储中每秒2000个实体的分区写入速度).

Low chance of lost logs (i.e. by exceeding the partition write rate of 2000 entities per second in Azure table storage).

按日期排序，最近一次.

Reading ordered by date, most recent first.

如果可能的话，将其划分为对查询有用的内容(例如产品区域).

If possible, to partition on something that would be useful to query (such as product area).

如何对用于存储日志的Azure表进行分区 [英] How to partition Azure tables used for storing logs

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何对用于存储日志的Azure表进行分区 [英] How to partition Azure tables used for storing logs

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭