如何在像MySQL这样的RDBMS中存储双向关系? [英] How to store bidirectional relationships in a RDBMS like MySQL?

查看:202
本文介绍了如何在像MySQL这样的RDBMS中存储双向关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想在我的应用程序的用户之间存储关系,类似于Facebook本身。

Suppose I want to store relationships among the users of my application, similar to Facebook, per se.

这意味着如果 A B 的朋友(或一些关系),然后 B 也是 A 的朋友。为了存储这种关系,我目前正在计划将它们存储在一个关系表中,如下所示

That means if A is a friend(or some relation) of B, then B is also a friend of A. To store this relationships I am currently planning to store them in a table for relations as follows

  UID      FriendID
 ------    --------
 user1      user2
 user1      user3
 user2      user1

但是我在这里面临两个选择:

However I am facing two options here:


  1. 典型的情况是,我将存储 user1 - > user2 user2-> user1 。这将占用更多的空间,但是(至少在我的头脑中)只需要一遍就可以显示特定用户的朋友。

  2. 另一个选项是存储 user1-> user2 OR user2-> user1 每当我想找到<$ c $的所有朋友c> user1 ,我将查询表的两列来查找用户的朋友。它将占用一半的空间(至少在我的头上)两倍的时间。

  1. The typical case, where I will store both user1 -> user2 and user2->user1. This will take more space, but (at least in my head) require just one pass over the rows to display the friends of a particular user.
  2. The other option would be to store either user1->user2 OR user2->user1 and whenever I want to find all the friends of user1, I will query on both columns of table to find a user's friends. It will take half the space but (again at least in my head) twice the amount of time.

首先,是我的推理适当?如果是的话,那么我有忘记的瓶颈(按比例/吞吐量还是任何东西)?

First of all, is my reasoning appropriate? If yes, then are there any bottlenecks that I am forgetting (in terms of scaling / throughput or anything)?

基本上,两者之间有什么折中,除了这里列出的。此外,在行业中还有一个优先选择?

Basically, are there any trade-offs between the two, other than the ones listed here. Also, in industry is one preferred over the other?

推荐答案

这是这两种方法在数据库中的物理表示方式:

Here is how these two approaches will be physically represented in the database:

让我们分析两种方法...

Let us analyze both approaches...

方法1(表中存储的两个方向):

Approach 1 (both directions stored in the table):


  • PRO:更简单的查询。

  • CON:只能插入/更新/删除

  • MINOR PRO:不需要额外的限制来确保友谊不能重复。

  • 需要进一步分析:

  • PRO: Simpler queries.
  • CON: Data can be corrupted by inserting/updating/deleting only one direction.
  • MINOR PRO: Doesn't require additional constraints to ensure a friendship cannot be duplicated.
  • Further analysis needed:

  1. TIE:一个索引涵盖的两个方向,所以你不需要辅助索引。

  2. TIE:存储要求。

  3. TIE:

  1. TIE: One index covers both directions, so you don't need a secondary index.
  2. TIE: Storage requirements.
  3. TIE: Performance.


方法2(只有一个方向存储在表格中):

Approach 2 (only one direction stored in the table):


  • CON:更复杂的查询。

  • PRO:无法通过忘记处理来破坏数据相反的方向,因为没有相反的方向

  • MINOR CON:需要 CHECK(UID< FriendID),所以不能以两种不同的方式表示相同的友谊,(UID,FriendID)上的键可以完成其工作。

  • 需要进一步分析:

  • CON: More complicated queries.
  • PRO: Can't corrupt the data by forgetting to handle the opposite direction, since there is no opposite direction.
  • MINOR CON: Requires CHECK(UID < FriendID), so a same friendship can never be represented in two different ways, and the key on (UID, FriendID) can do its job.
  • Further analysis needed:

  1. TIE: cover 查询的两个方向( {UID,FriendID} 和复合索引 {FriendID,UID} )。

  2. TIE:存储要求。

  3. TIE:Performance。

  1. TIE: Two indexes are necessary to cover both directions of querying (composite index on {UID, FriendID} and composite index on {FriendID, UID}).
  2. TIE: Storage requirements.
  3. TIE: Performance.


是特别感兴趣的。始终使用MySQL / InnoDB 集群数据,并且辅助索引在集群表中可能是昂贵的(请参阅这篇文章),所以看起来好像方法2中的次要索引会占用更少行的所有优点。 然而,辅助索引包含与主要字段完全相同的字段(仅以相反的顺序),因此在此特定情况下不存在存储开销。还没有指向表堆的指针(因为没有表堆),所以它可能甚至比基于堆的索引更便宜的存储。并且假设查询被索引覆盖,则不会在集群表中通常与辅助索引相关联的双重查找。所以,这基本上是一个关系(既不是方法1也不是方法2有显着的优势)。

The point 1 is of special interest. MySQL/InnoDB always clusters data, and secondary indexes can be expensive in clustered tables (see "Disadvantages of clustering" in this article), so it might seem as if the secondary index in approach 2 would eat-up all the advantages of fewer rows. However, the secondary index contains the exact same fields as the primary (only in the opposite order) so there is no storage overhead in this particular case. There is also no pointer to table heap (since there is no table heap), so it's probably even cheaper storage-wise that a normal heap-based index. And assuming the query is covered with the index, there won't be a double-lookup normally associated with a secondary index in a clustered table either. So, this is basically a tie (neither approach 1 nor approach 2 has significant advantage).

点2 与点1 :我们是否会有一个N值的B树或两个B树,每个都有N / 2个值。所以这也是一个关系:两种方法都会消耗大约相同的存储空间。

The point 2 is related to the point 1: it doesn't matter whether we will have a B-Tree of N values or two B-Trees, each with N/2 values. So this is also a tie: both approaches will use-up approximately same amount of storage.

同样的推理适用于第3点:是否我们搜索一个较大的B-Tree或2个较小的B-Tree,并没有太大的区别,所以这也是一个关系。

The same reasoning applies to point 3: whether we search one larger B-Tree or 2 smaller ones, doesn't make much of a difference, so this is also a tie.

所以,为了健壮性,尽管查询有些较为粗糙,还需要额外的 CHECK ,我会采用方法2。

So, for the robustness, and despite somewhat uglier queries and a need for additional CHECK, I'd go with the approach 2.

这篇关于如何在像MySQL这样的RDBMS中存储双向关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆