跨层次数据优化MySQL查询 [英] Optimising MySQL queries across hierarchical data

查看：173 发布时间：2017/3/22 0:11:29 mysql sql database-design data-structures query-optimization

本文介绍了跨层次数据优化MySQL查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个相当稳定的有序图〜100k的顶点和大小〜1k的边。它是二维的，因为它的顶点可以通过一对整数（x，y）（基数〜100 x〜1000）来识别，并且所有边都严格增加在 x 。

I have a fairly stable directed graph of order ~100k vertices and size ~1k edges. It is two-dimensional insofar as its vertices can be identified by a pair of integers (x, y) (of cardinality ~100 x ~1000) and all edges are strictly increasing in x.

此外还有一个〜1k （key，val）的字典， 与每个顶点相关联的对。

There is furthermore a dictionary of ~1k (key, val) pairs associated with each vertex.

我目前正在三个（InnoDB）表中将数据库存储在MySQL数据库中：一个顶点表我不认为这与我的问题有关，所以我省略了在下面的摘录中包含它和引用它的外键限制）;一个容纳字典的表;以及由Bill Karwin如此雄辩地描述的连接顶点的关闭表。

I am currently storing the graph in a MySQL database across three (InnoDB) tables: a table of vertices (which I don't think is relevant to my question, so I have omitted to include both it and the foreign key constraints that refer to it in my extracts below); a table which holds the dictionaries; and a 'closure table' of connected vertices as described so eloquently by Bill Karwin.

顶点字典表定义如下：

CREATE TABLE `VertexDictionary` (
  `x`   smallint(6) unsigned NOT NULL,
  `y`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  `val` smallint(1) DEFAULT NULL,
  PRIMARY KEY (`x`, `y`  , `key`),
  KEY  `dict` (`x`, `key`, `val`)
);

和所连接顶点的封闭表为：

and the closure table of connected vertices as:

CREATE TABLE `ConnectedVertices` (
  `tail_x` smallint(6) unsigned NOT NULL,
  `tail_y` smallint(6) unsigned NOT NULL,
  `head_x` smallint(6) unsigned NOT NULL,
  `head_y` smallint(6) unsigned NOT NULL,
  PRIMARY KEY   (`tail_x`, `tail_y`, `head_x`),
  KEY `reverse` (`head_x`, `head_y`, `tail_x`),
  KEY `fx` (`tail_x`, `head_x`),
  KEY `rx` (`head_x`, `tail_x`)
);

还有一个（x，key）对，对于每个这样的对，用 x 标识的所有顶点在其字典内都有一个值 key 。这个字典存储在第四个表中：

There is also a dictionary of (x, key) pairs such that for each such pair, all vertices identified with that x have within their dictionaries a value for that key. This dictionary is stored in a fourth table:

CREATE TABLE `SpecialKeys` (
  `x`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  PRIMARY KEY (`x`),
  KEY `xkey`  (`x`, `key`)
);

我经常希望提取所有顶点的字典中使用的一组键，具有特定的 x = X ，以及连接到左侧的任何 SpecialKeys 的关联值：

I often wish to extract the set of keys used in the dictionaries of all vertices having a particular x=X, together with the associated value of any SpecialKeys connected to the left:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
  `v`.`x` = X
;

其中 EXPLAIN 输出是： / p>

for which the EXPLAIN output is:


id   select_type   table   type     possible_keys           key       key_len   ref                                rows   Extra
 1   SIMPLE        k       index    PRIMARY,xkey            xkey          154   NULL                                 40   Using index; Using temporary
 1   SIMPLE        c       ref      PRIMARY,reverse,fx,rx   PRIMARY         2   db.k.x                                1   Using where
 1   SIMPLE        v       ref      PRIMARY,dict            PRIMARY         4   const,db.c.head_y                   136   Using index
 1   SIMPLE        u       eq_ref   PRIMARY,dict            PRIMARY       156   db.c.tail_x,db.c.tail_y,db.k.key      1   Using where

但是这个查询需要10秒才能完成。把我的头撞在砖墙上，试图改善事情，但无济于事。

But this query takes ~10s to complete. Been banging my head against a brick wall trying to improve matters, but to no avail.

可以改进查询，还是应该考虑不同的数据结构？非常感谢您的想法！

Can the query be improved, or should I consider a different data structure? Extremely grateful for your thoughts!

更新

我仍然无处可寻，尽管我重建了表，发现 EXPLAIN 输出略有不同（如上图所示，数字从 v 中提取的行已从1增加到136！查询仍然需要10秒钟才能执行。

I'm still getting nowhere with this, although I did rebuild the tables and found the EXPLAIN output to be slightly different (as now shown above, the number of rows fetched from v had increased from 1 to 136!); the query is still taking ~10s to execute.

我真的不明白这里发生了什么。查询获取所有（x，y，SpecialValue）和所有（x，y，key）元组都非常快速（分别约30ms和〜150ms），但基本上加入的时间比组合时间长五十倍以上？如何提高执行加盟所需的时间？

I really don't understand what's going on here. Queries to obtain all (x, y, SpecialValue) and all (x, y, key) tuples are both very fast (~30ms and ~150ms respectively), yet essentially joining the two takes over fifty times longer than their combined time... how can I improve the time taken to perform that join?

的输出SHOW VARIABLES LIKE'％innodb％'; 以下：


Variable_name                    Value
------------------------------------------------------------
have_innodb                      YES
ignore_builtin_innodb            ON
innodb_adaptive_flushing         ON
innodb_adaptive_hash_index       ON
innodb_additional_mem_pool_size  2097152
innodb_autoextend_increment      8
innodb_autoinc_lock_mode         1
innodb_buffer_pool_size          1179648000
innodb_change_buffering          inserts
innodb_checksums                 ON
innodb_commit_concurrency        0
innodb_concurrency_tickets       500
innodb_data_file_path            ibdata1:10M:autoextend
innodb_data_home_dir             /rdsdbdata/db/innodb
innodb_doublewrite               ON
innodb_fast_shutdown             1
innodb_file_format               Antelope
innodb_file_format_check         Barracuda
innodb_file_per_table            ON
innodb_flush_log_at_trx_commit   1
innodb_flush_method              O_DIRECT
innodb_force_recovery            0
innodb_io_capacity               200
innodb_lock_wait_timeout         50
innodb_locks_unsafe_for_binlog   OFF
innodb_log_buffer_size           8388608
innodb_log_file_size             134217728
innodb_log_files_in_group        2
innodb_log_group_home_dir        /rdsdbdata/log/innodb
innodb_max_dirty_pages_pct       75
innodb_max_purge_lag             0
innodb_mirrored_log_groups       1
innodb_old_blocks_pct            37
innodb_old_blocks_time           0
innodb_open_files                300
innodb_read_ahead_threshold      56
innodb_read_io_threads           4
innodb_replication_delay         0
innodb_rollback_on_timeout       OFF
innodb_spin_wait_delay           6
innodb_stats_method              nulls_equal
innodb_stats_on_metadata         ON
innodb_stats_sample_pages        8
innodb_strict_mode               OFF
innodb_support_xa                ON
innodb_sync_spin_loops           30
innodb_table_locks               ON
innodb_thread_concurrency        0
innodb_thread_sleep_delay        10000
innodb_use_sys_malloc            ON
innodb_version                   1.0.16
innodb_write_io_threads          4

推荐答案

没有花时间测试，你提供了一个不完整的例子？
你一定要尝试连接表的重新排序。解释输出提供一些信息，让我们说，由key_len订购应该是启发式的最快。我认为，要过滤的第一张表应该列为最后一个，以防优化器无法弄清楚。

Without spending time testing it, you provided an incomplete example? you should definitely try reordering of joined tables. Explain output provides some info, let's say ordering by key_len should be heuristically fastest. First table to be filtered on should be listed as last in case the optimizer is not able to figure that out, I believe.

所以，让我们说'c，v，k，u'顺序是最好的。

So, let's say 'c, v, k, u' order is the best.

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `VertexDictionary`  AS `v`
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
           AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  `v`.`x` = X
;

'rows'会建议'c / u，k，v'顺序，但这取决于数据：

'rows' would suggest 'c/u, k, v' order, but that depends on data:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `VertexDictionary`  AS `v`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
                                 AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
 WHERE
  `v`.`x` = X
;

希望这有帮助。

UPDATE （避免使用varchar join）：

UPDATE (avoiding the varchar join):

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  (`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
  `v`.`x` = X
;

这篇关于跨层次数据优化MySQL查询的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

跨层次数据优化MySQL查询 [英] Optimising MySQL queries across hierarchical data

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

跨层次数据优化MySQL查询 [英] Optimising MySQL queries across hierarchical data

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭