选择正确的分区规则 [英] Choosing the right partitioning rule

查看：96 发布时间：2020/5/30 0:37:52 postgresql database-partitioning

本文介绍了选择正确的分区规则的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在建立一个新的PostgreSQL 9数据库，其中将包含数百万（或数十亿）行。因此，我决定使用PostgreSQL继承对数据进行分区。

I am setting up a new PostgreSQL 9 database that will contains millions (or maybe billions) of rows. So I decided to partition data using PostgreSQL inheritance.

我创建了这样的主表（例如，经过简化）：

I created a master table like this (simplified for example) :

CREATE TABLE mytable
(
  user_id integer,
  year integer,
  CONSTRAINT pk_mytable PRIMARY KEY (user_id, year)
);

和10个分区表：

CREATE TABLE mytable_0 () INHERITS (mytable);
CREATE TABLE mytable_1 () INHERITS (mytable);
...
CREATE TABLE mytable_9 () INHERITS (mytable);

我知道总是使用唯一的user_id条件从应用程序访问行。
因此，我想使用基于user_id的规则将数据相当大地分布在10个表上。

I know that rows will always be accessed from the application using a unique user_id condition. So I would like to spread data "quite" equally over the 10 tables using a rule based on user_id.

要调整主表上的查询，我的第一个想法是使用模数检查约束：

To tune queries over the master table, my first idea was to use a modulus check constraint :

ALTER TABLE mytable_0 ADD CONSTRAINT mytable_user_id_check CHECK (user_id % 10 = 0);
ALTER TABLE mytable_1 ADD CONSTRAINT mytable_user_id_check CHECK (user_id % 10 = 1);
...

问题是，当我查询主表 mytable时，在user_id上的条件下，PostgreSQL分析器检查所有表并且没有从检查约束中受益：

The problem is, when I query the master table "mytable" with the condition on user_id, PostgreSQL analyzer check all the tables and do not benefit from the check constraint :

EXPLAIN SELECT * FROM mytable WHERE user_id = 12345;

"Result  (cost=0.00..152.69 rows=64 width=36)"
"  ->  Append  (cost=0.00..152.69 rows=64 width=36)"
"        ->  Seq Scan on mytable  (cost=0.00..25.38 rows=6 width=36)"
"              Filter: (user_id = 12345)"
"        ->  Seq Scan on mytable_0 mytable  (cost=0.00..1.29 rows=1 width=36)"
"              Filter: (user_id = 12345)"
"        ->  Seq Scan on mytable_1 mytable  (cost=0.00..1.52 rows=1 width=36)"
"              Filter: (user_id = 12345)"
...
"        ->  Seq Scan on mytable_9 mytable  (cost=0.00..1.52 rows=1 width=36)"
"              Filter: (user_id = 12345)"

如果我使用这样的经典CHECK CONSTRAINT（以及与该规则匹配的分区）：

Whereas if I use a classic CHECK CONSTRAINT like this (and the repartition that match that rule) :

ALTER TABLE mytable_0 ADD CONSTRAINT mytable_user_id_check CHECK (user_id BETWEEN 1 AND 10000);
ALTER TABLE mytable_1 ADD CONSTRAINT mytable_user_id_check CHECK (user_id BETWEEN 10001 AND 20000);
...

它将仅扫描符合条件的表（在此示例）：

it will scan only tables that match the condition (mytable and mytable_1 in this example) :

"Result  (cost=0.00..152.69 rows=64 width=36)"
"  ->  Append  (cost=0.00..152.69 rows=64 width=36)"
"        ->  Seq Scan on mytable  (cost=0.00..25.38 rows=6 width=36)"
"              Filter: (user_id = 12345)"
"        ->  Seq Scan on mytable_1 mytable  (cost=0.00..1.52 rows=1 width=36)"
"              Filter: (user_id = 12345)"

但是使用这种检查约束很难维护，因为将填充表中的用户范围多年来会改变。首先是成千上万，在不久的将来可能是数百万甚至更多...

But using such check constraint is difficult to maintain because the range of users that will be populated in the tables will change over the years. thousands first, maybe millions or more in the near future...

我可以使用什么规则将我的数据平均分配到10个可以受益于检查约束的表上这样在主表上的SELECT将只扫描正确的表...？

What rule could I use to partition equally my data over the 10 tables that could benefit from a check constraint so that a SELECT on the master table will scan only the right table...?

谢谢，

Nico

选择正确的分区规则 [英] Choosing the right partitioning rule

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

选择正确的分区规则 [英] Choosing the right partitioning rule

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭