Postgres中有多少个表分区? [英] How many table partitions is too many in Postgres?

查看:366
本文介绍了Postgres中有多少个表分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对一个包含时态数据的非常大的表进行分区,并考虑应该对分区进行何种粒度。 Postgres 分区文档声称可能存在大量分区以大大增加查询计划时间,并建议将分区与多达一百个分区一起使用。

I'm partitioning a very large table that contains temporal data, and considering to what granularity I should make the partitions. The Postgres partition documentation claims that "large numbers of partitions are likely to increase query planning time considerably" and recommends that partitioning be used with "up to perhaps a hundred" partitions.

假设我的表已分区了十年,到最后,我最终将拥有500多个分区。在排除这一点之前,我想更好地了解分区数量对查询计划时间的影响。有没有人对此进行基准测试,或者有人对它在内部的工作方式有所了解?

Assuming my table holds ten years of data, if I partitioned by week I would end up with over 500 partitions. Before I rule this out, I'd like to better understand what impact partition quantity has on query planning time. Has anyone benchmarked this, or does anyone have an understanding of how this works internally?

推荐答案

查询计划者必须进行线性搜索查询中使用的表的每个分区的约束信息,以找出实际涉及的表-可以包含所请求数据所需的行的表。当您连接更多表时,计划者认为的查询计划数量呈指数增长。因此,线性搜索加起来足够麻烦的确切时间实际上取决于查询的复杂性。加入的次数越多,遭受此攻击的后果就越严重。 多达一百的数字来自于指出,即使在该点附近进行更简单的查询,查询计划时间也总计不短的时间。特别是在Web应用程序上,响应时间的延迟很重要,这是一个问题;

The query planner has to do a linear search of the constraint information for every partition of tables used in the query, to figure out which are actually involved--the ones that can have rows needed for the data requested. The number of query plans the planner considers grows exponentially as you join more tables. So the exact spot where that linear search adds up to enough time to be troubling really depends on query complexity. The more joins, the worse you will get hit by this. The "up to a hundred" figure came from noting that query planning time was adding up to a non-trivial amount of time even on simpler queries around that point. On web applications in particular, where latency of response time is important, that's a problem; thus the warning.

您能支持500个吗?当然。但是,您将针对优化器考虑的涉及该表的每个查询计划,在500个检查约束中每一个进行搜索。如果您不关心查询计划时间,那么也许您不在乎。但是大多数站点最终都讨厌具有这么多分区的查询计划所花费的时间比例,这就是为什么每月分区成为大多数数据集标准的原因之一。您可以轻松存储10年的数据,每月进行分区,然后再开始着手计划开销开始明显的地方。

Can you support 500? Sure. But you are going to be searching every one of 500 check constraints for every query plan involving that table considered by the optimizer. If query planning time isn't a concern for you, then maybe you don't care. But most sites end up disliking the proportion of time spent on query planning with that many partitions, which is one reason why monthly partitioning is the standard for most data sets. You can easily store 10 years of data, partitioned monthly, before you start crossing over into where planning overhead starts to be noticeable.

这篇关于Postgres中有多少个表分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆