跨层次数据优化MySQL查询 [英] Optimising MySQL queries across hierarchical data

查看:173
本文介绍了跨层次数据优化MySQL查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当稳定的有序图〜100k的顶点和大小〜1k的边。它是二维的,因为它的顶点可以通过一对整数(x,y)(基数〜100 x〜1000)来识别,并且所有边都严格增加在 x

I have a fairly stable directed graph of order ~100k vertices and size ~1k edges. It is two-dimensional insofar as its vertices can be identified by a pair of integers (x, y) (of cardinality ~100 x ~1000) and all edges are strictly increasing in x.

此外还有一个〜1k (key,val)的字典, 与每个顶点相关联的对。

There is furthermore a dictionary of ~1k (key, val) pairs associated with each vertex.

我目前正在三个(InnoDB)表中将数据库存储在MySQL数据库中:一个顶点表我不认为这与我的问题有关,所以我省略了在下面的摘录中包含它和引用它的外键限制);一个容纳字典的表;以及由Bill Karwin如此雄辩地描述的连接顶点的关闭表。

I am currently storing the graph in a MySQL database across three (InnoDB) tables: a table of vertices (which I don't think is relevant to my question, so I have omitted to include both it and the foreign key constraints that refer to it in my extracts below); a table which holds the dictionaries; and a 'closure table' of connected vertices as described so eloquently by Bill Karwin.

顶点字典表定义如下:

CREATE TABLE `VertexDictionary` (
  `x`   smallint(6) unsigned NOT NULL,
  `y`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  `val` smallint(1) DEFAULT NULL,
  PRIMARY KEY (`x`, `y`  , `key`),
  KEY  `dict` (`x`, `key`, `val`)
);

和所连接顶点的封闭表为:

and the closure table of connected vertices as:

CREATE TABLE `ConnectedVertices` (
  `tail_x` smallint(6) unsigned NOT NULL,
  `tail_y` smallint(6) unsigned NOT NULL,
  `head_x` smallint(6) unsigned NOT NULL,
  `head_y` smallint(6) unsigned NOT NULL,
  PRIMARY KEY   (`tail_x`, `tail_y`, `head_x`),
  KEY `reverse` (`head_x`, `head_y`, `tail_x`),
  KEY `fx` (`tail_x`, `head_x`),
  KEY `rx` (`head_x`, `tail_x`)
);

还有一个(x,key)对,对于每个这样的对,用 x 标识的所有顶点在其字典内都有一个值 key 。这个字典存储在第四个表中:

There is also a dictionary of (x, key) pairs such that for each such pair, all vertices identified with that x have within their dictionaries a value for that key. This dictionary is stored in a fourth table:

CREATE TABLE `SpecialKeys` (
  `x`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  PRIMARY KEY (`x`),
  KEY `xkey`  (`x`, `key`)
);

我经常希望提取所有顶点的字典中使用的一组键,具有特定的 x = X ,以及连接到左侧的任何 SpecialKeys 的关联值:

I often wish to extract the set of keys used in the dictionaries of all vertices having a particular x=X, together with the associated value of any SpecialKeys connected to the left:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
  `v`.`x` = X
;

其中 EXPLAIN 输出是: / p>

for which the EXPLAIN output is:


id   select_type   table   type     possible_keys           key       key_len   ref                                rows   Extra
 1   SIMPLE        k       index    PRIMARY,xkey            xkey          154   NULL                                 40   Using index; Using temporary
 1   SIMPLE        c       ref      PRIMARY,reverse,fx,rx   PRIMARY         2   db.k.x                                1   Using where
 1   SIMPLE        v       ref      PRIMARY,dict            PRIMARY         4   const,db.c.head_y                   136   Using index
 1   SIMPLE        u       eq_ref   PRIMARY,dict            PRIMARY       156   db.c.tail_x,db.c.tail_y,db.k.key      1   Using where

但是这个查询需要10秒才能完成。把我的头撞在砖墙上,试图改善事情,但无济于事。

But this query takes ~10s to complete. Been banging my head against a brick wall trying to improve matters, but to no avail.

可以改进查询,还是应该考虑不同的数据结构?非常感谢您的想法!

Can the query be improved, or should I consider a different data structure? Extremely grateful for your thoughts!

更新

我仍然无处可寻,尽管我重建了表,发现 EXPLAIN 输出略有不同(如上图所示,数字从 v 中提取的行已从1增加到136!查询仍然需要10秒钟才能执行。

I'm still getting nowhere with this, although I did rebuild the tables and found the EXPLAIN output to be slightly different (as now shown above, the number of rows fetched from v had increased from 1 to 136!); the query is still taking ~10s to execute.

我真的不明白这里发生了什么。查询获取所有(x,y,SpecialValue)和所有(x,y,key)元组都非常快速(分别约30ms和〜150ms),但基本上加入的时间比组合时间长五十倍以上?如何提高执行加盟所需的时间?

I really don't understand what's going on here. Queries to obtain all (x, y, SpecialValue) and all (x, y, key) tuples are both very fast (~30ms and ~150ms respectively), yet essentially joining the two takes over fifty times longer than their combined time... how can I improve the time taken to perform that join?

的输出SHOW VARIABLES LIKE'%innodb%'; 以下:


Variable_name                    Value
------------------------------------------------------------
have_innodb                      YES
ignore_builtin_innodb            ON
innodb_adaptive_flushing         ON
innodb_adaptive_hash_index       ON
innodb_additional_mem_pool_size  2097152
innodb_autoextend_increment      8
innodb_autoinc_lock_mode         1
innodb_buffer_pool_size          1179648000
innodb_change_buffering          inserts
innodb_checksums                 ON
innodb_commit_concurrency        0
innodb_concurrency_tickets       500
innodb_data_file_path            ibdata1:10M:autoextend
innodb_data_home_dir             /rdsdbdata/db/innodb
innodb_doublewrite               ON
innodb_fast_shutdown             1
innodb_file_format               Antelope
innodb_file_format_check         Barracuda
innodb_file_per_table            ON
innodb_flush_log_at_trx_commit   1
innodb_flush_method              O_DIRECT
innodb_force_recovery            0
innodb_io_capacity               200
innodb_lock_wait_timeout         50
innodb_locks_unsafe_for_binlog   OFF
innodb_log_buffer_size           8388608
innodb_log_file_size             134217728
innodb_log_files_in_group        2
innodb_log_group_home_dir        /rdsdbdata/log/innodb
innodb_max_dirty_pages_pct       75
innodb_max_purge_lag             0
innodb_mirrored_log_groups       1
innodb_old_blocks_pct            37
innodb_old_blocks_time           0
innodb_open_files                300
innodb_read_ahead_threshold      56
innodb_read_io_threads           4
innodb_replication_delay         0
innodb_rollback_on_timeout       OFF
innodb_spin_wait_delay           6
innodb_stats_method              nulls_equal
innodb_stats_on_metadata         ON
innodb_stats_sample_pages        8
innodb_strict_mode               OFF
innodb_support_xa                ON
innodb_sync_spin_loops           30
innodb_table_locks               ON
innodb_thread_concurrency        0
innodb_thread_sleep_delay        10000
innodb_use_sys_malloc            ON
innodb_version                   1.0.16
innodb_write_io_threads          4


推荐答案

没有花时间测试,你提供了一个不完整的例子?
你一定要尝试连接表的重新排序。解释输出提供一些信息,让我们说,由key_len订购应该是启发式的最快。我认为,要过滤的第一张表应该列为最后一个,以防优化器无法弄清楚。

Without spending time testing it, you provided an incomplete example? you should definitely try reordering of joined tables. Explain output provides some info, let's say ordering by key_len should be heuristically fastest. First table to be filtered on should be listed as last in case the optimizer is not able to figure that out, I believe.

所以,让我们说'c,v,k,u'顺序是最好的。

So, let's say 'c, v, k, u' order is the best.

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `VertexDictionary`  AS `v`
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
           AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  `v`.`x` = X
;

'rows'会建议'c / u,k,v'顺序,但这取决于数据:

'rows' would suggest 'c/u, k, v' order, but that depends on data:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `VertexDictionary`  AS `v`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
                                 AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
 WHERE
  `v`.`x` = X
;

希望这有帮助。

UPDATE (避免使用varchar join):

UPDATE (avoiding the varchar join):

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  (`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
  `v`.`x` = X
;

这篇关于跨层次数据优化MySQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆