在N个postgresql分片上使用WHERE和ORDER BY进行应用程序级JOIN [英] Application-level JOIN with WHERE and ORDER BY on N postgresql shards

查看:145
本文介绍了在N个postgresql分片上使用WHERE和ORDER BY进行应用程序级JOIN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Postgresql集群,其中的不同表驻留在不同的碎片(不同的物理Postgresql服务器)中。 EG:

I have a postgresql cluster with different tables residing within different shards (different physical postgresql servers). EG:

分片A
+用户组(user_group_id,user_group_name)

shard A + user_group (user_group_id, user_group_name)

分片B
+用户(user_id,user_group_id(NULL),user_name)

shard B + user (user_id, user_group_id (NULL), user_name)

分片C
+注释(comment_id,user_id,comment_content)

shard C + comment (comment_id, user_id, comment_content)

我需要运行查询,如果所有3个表都位于同一分片上,则看起来像这样:

I need to run queries that if all 3 tables where on the same shard, it would look something like:

选择comment_id,comment_content FROM comment INNER JOIN用户ON comment.user_id = user.user_id LEFT JOIN user_group ON user.user_group_id = user_group.user_group_id WHERE user_group_id> 10 AND user_name LIKE'foo%'ORDER BY user_group_name ASC,user_name ASC,comment_id ASC

SELECT comment_id, comment_content FROM comment INNER JOIN user ON comment.user_id = user.user_id LEFT JOIN user_group ON user.user_group_id = user_group.user_group_id WHERE user_group_id > 10 AND user_name LIKE 'foo%' ORDER BY user_group_name ASC, user_name ASC, comment_id ASC

如果3个表位于3个不同的物理Postgresql分片中,将如何实现这样的查询?

How would such a query be implemented if the 3 tables are residing within 3 different physical postgresql shards?

我已经阅读了有关引用的内容那将不得不在应用程序层中进行联接,但是我不确定该如何进行。其中的一些复杂性包括:
1.不同表的基数是未知的(或可以随时间变化),因此从应用程序层(例如:PHP,Python等),我们将不知道如果我们应该先查询user_group,获取所有用户组,然后查询user,获取所有用户,等等...,或者首先查询注释,获取所有注释,然后按用户过滤检索的注释,然后按用户过滤组等...

I've read about references that one would have to 'do the join in the application layer' but I am not sure how to go about this. Some of the complexities include: 1. The cardinality of the different tables is unknown (or can change over time), so from the application layer (EG: php, python, etc...), we would not know if we should first go query user_group, get all the user groups, then query user, get all the users, etc..., or first query comment, get all the comments, then filter the retrieved comments by users, then filter by user groups, etc...

我正在寻找一种将sql转换为应用程序级联接的通用方法,并且以上方案只是一个假设的示例。

I am looking for a generic way to translate sql into application-level joins, and the above schema is only a hypothetical example.

推荐答案

通常,数据在分片上的划分方式完全可以避免跨服务器的JOINS。因为此操作困难且昂贵。如果您的示例是假设的,我建议将所有数据除以user_id字段或user_group_id。

Usually data divides at shards in a way which allow to avoid cross-server JOINS at all. Because this operation is difficult and expensive. If your example is hypothetical I would recommend divide all data by user_id field or user_group_id.

例如,分片A将包含所有包含用户信息的表,其中user_id%3 = 0 ,分片B-哪个user_id%3 = 1,分片C-哪个user_id%3 =2。因此,大多数需要的JOINS将位于一个分片内。对于某些复杂的跨服务器查询,您可能具有通用的NO-SQL存储(例如memcached或Redis),这些存储将具有来自所有分片的所需数据的副本(当然,它不是所有表的完整副本)。这样的存储可以轻松地在所需数量的服务器上复制。高负荷项目就是这样工作的。

For example shard A will contain all tables with information from users which user_id % 3 = 0, shard B - which user_id % 3 = 1, shard C - which user_id % 3 = 2. So most of needed JOINS will be inside one shard. For some complex cross-server queries you may have common NO-SQL storage like memcached or Redis which will have copies of needed data from all shards (of course it's not full copy of all tables). Such storages can be easily replicated on as much servers as you need. This is how highload projects works.

这篇关于在N个postgresql分片上使用WHERE和ORDER BY进行应用程序级JOIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆