MySQL高效地存储无向图边 [英] MySQL storing undirected graph edges efficiently

查看:123
本文介绍了MySQL高效地存储无向图边的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想存储无向图边(例如,对于朋友)。要存储和检索节点 a 的所有朋友,可以使用:

为每个边创建两行,查询每个节点一列:

  + -------------------- ------ + 
| id | from_node | to_node |
+ -------------------------- +
| 1 | a | b |
| 2 | b | a |
+ -------------------------- +
SELECT * FROM`x` WHERE from_node = a

为每条边创建一行,使用

  + -------------------------- + 
| id | node_a | node_b |
+ -------------------------- +
| 1 | a | b |
+ -------------------------- +
SELECT * FROM`y` WHERE node_a = a OR node_b = a

这可以提高查找效率吗?


  • x with 2n 行, from_node to_node ,查找一列
  • y n 行, node_a node_b ,使用 OR


解决方案

如果您优化了所有内容,那么假设您从磁盘读取数据并查询单个人的朋友,则X将是最快的。这是因为您可以将数据安排在磁盘上,以便它们被命令匹配一个您正在查询的索引。所以,对于一个人来说,你只需要做一次磁盘寻道。 Y需要查询两个索引,因此可能意味着多次查找朋友,即使是单个人(并且磁盘访问时间通常会支配简单查询)。



请参阅聚集索引在维基百科(和 mysql手册

if你很幸运知道数据总是在内存中,那么他们可能会足够快(即使数据在磁盘上,它们可能足够快 - 我不是说X是最好的设计,只有这样它可以变得最高效)。

I want to store undirected graph edges (for example, for friends). To store and retrieve all friends of node a, one can use:

Create two rows per edge, query on one column per node:

+--------------------------+
| id | from_node | to_node |
+--------------------------+
| 1  |  a        |  b      |
| 2  |  b        |  a      |
+--------------------------+
SELECT * FROM `x` WHERE from_node = a

Create one row per edge, use OR:

+--------------------------+
| id | node_a    | node_b  |
+--------------------------+
| 1  |  a        |  b      |
+--------------------------+
SELECT * FROM `y` WHERE node_a = a OR node_b = a

Which makes for more efficient lookups?

  • Table x with 2n rows, indices on from_node and to_node, lookup on one column
  • Table y with n rows, indices on node_a and node_b, lookup on both columns using OR

解决方案

if you optimise everything, then X will be fastest, assuming that you read data from disk and are querying for friends of a single person. that's because you can arrange your data on disk so that they are ordered to match one index, which is the one you are querying. so, for a single person, you only need to do one disk seek. Y requires queries on two indices, so may imply multiple seeks to retrieve friends, even for a single person (and disk access time usually dominates simple queries).

see clustered indices at wikipedia (and the mysql manual)

if you are lucky enough to know that data will always be in memory then they will likely both be "fast enough" (and even if the data are on disk they may be fast enough - i am not saying X is the best design, only that it can be made most efficient).

这篇关于MySQL高效地存储无向图边的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆