通过重连接优化MySQL查询 [英] Optimising MySQL queries with heavy joins

查看:145
本文介绍了通过重连接优化MySQL查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在运行一个网站,可以在列表中追踪最新的分数和评分。该列表有数千个经常更新的条目,列表应该可以通过这些分数和评分列进行排序。

I currently run a site which tracks up-to-the-minute scores and ratings in a list. The list has thousands of entries that are updated frequently, and the list should be sortable by these score and ratings columns.

我的SQL获取此数据目前看起来像(大致):

My SQL for getting this data currently looks like (roughly):

SELECT e.*, SUM(sa.amount) AS score, AVG(ra.rating) AS rating
FROM entries e 
LEFT JOIN score_adjustments sa ON sa.entry_id = e.id
    HAVING sa.created BETWEEN ... AND ... 
LEFT JOIN rating_adjustments ra ON ra.entry_id = e.id
    HAVING ra.rating > 0 
ORDER BY score 
LIMIT 0, 10

简化):

entries:
    id: INT(11) PRIMARY
    ...other data...

score_adjustments:
    id: INT(11), PRIMARY
    entry_id: INT(11), INDEX, FOREIGN KEY (entries.id)
    created: DATETIME
    amount: INT(4)

rating_adjustments:
    id: INT(11), PRIMARY
    entry_id: INT(11), INDEX, FOREIGN KEY (entries.id)
    rating: DOUBLE

有约30万 score_adjustments 条目,它们每天增长约5000个。 rating_adjustments 约为1/4。

There are approx 300,000 score_adjustments entries and they grow at about 5,000 a day. The rating_adjustments is about 1/4 that.

现在,我不是DBA专家,但我猜调用 SUM() AVG()所有的时间不是一件好事 - 特别是当 sa ra 包含数十万条记录 - 对吗?

Now, I'm no DBA expert but I'm guessing calling SUM() and AVG() all the time isn't a good thing - especially when sa and ra contain hundreds of thousands of records - right?

我已经在查询上执行缓存,但我希望查询本身快速,但仍然尽可能最新。我想知道是否有人可以共享任何解决方案来优化这样的重型加入/聚合查询?如果需要,我愿意进行结构更改。

I already do caching on the query, but I want the query itself to be fast - yet still as up to date as possible. I was wondering if anyone could share any solutions to optimise heavy join/aggregation queries like this? I'm willing to make structural changes if necessary.

编辑1

添加更多有关查询的信息。

Added more info about the query.

推荐答案

您的数据不好 clustered

InnoDB会将行存储为关闭PK身体靠近在一起。由于您的子表使用代理PK,所以它们的行将被随机存储。当在主表中给定行进行计算的时候,DBMS必须遍及所有位置从子表收集相关行。

InnoDB will store rows with "close" PKs physically close together. Since your child tables use surrogate PKs, their rows will be stored in effect randomly. When the time comes to make calculations for the given row in the "master" table, DBMS must jump all over the place to gather the related rows from the child tables.

而不是替代键,尝试使用更多的自然键,父母的PK位于前沿,类似于:

Instead of surrogate keys, try using more "natural" keys, with the parent's PK in the leading edge, similar to this:

score_adjustments:
    entry_id: INT(11), FOREIGN KEY (entries.id)
    created: DATETIME
    amount: INT(4)
    PRIMARY KEY (entry_id, created)

rating_adjustments:
    entry_id: INT(11), FOREIGN KEY (entries.id)
    rating_no: INT(11)
    rating: DOUBLE
    PRIMARY KEY (entry_id, rating_no)

注意:这假设创建的分辨率是很好的,并且添加了 rating_no 以允许每个 entry_id 多个评级。这只是一个例子 - 您可以根据需要更改PK。

NOTE: This assumes created's resolution is fine enough and the rating_no was added to allow multiple ratings per entry_id. This is just an example - you may vary the PKs according to your needs.

这将强制属于同一个 entry_id 被物理上靠近在一起存储,因此可以通过PK /聚类密钥上的范围扫描和非常少的I / O来计算SUM或AVG。

This will "force" rows belonging to the same entry_id to be stored physically close together, so a SUM or AVG can be calculated by just a range scan on the PK/clustering key and with very few I/Os.

或者(例如,如果您使用的是不支持集群的MyISAM),覆盖具有索引的查询,所以在查询期间子表没有被触摸。

Alternatively (e.g. if you are using MyISAM that doesn't support clustering), cover the query with indexes so the child tables are not touched during querying at all.

除此之外,您可以对设计进行非规范化,并将当前结果缓存在父表中:

On top of that, you could denormalize your design, and cache the current results in the parent table:


  • 存储SUM score_adjustments.amount)作为物理字段,并且每次从 score_adjustments 插入,更新或删除行时通过触发器进行调整。

  • 将SUM(rating_adjustments.rating)作为S COUNT(rating_adjustments.rating)存储为C。当一行添加到 rating_adjustments 时,将其添加到S并增加C.计算运行时的S / C以获得平均值。处理更新和删除类​​似。

  • Store SUM(score_adjustments.amount) as a physical field and adjust it via triggers every time a row is inserted, updated or deleted from score_adjustments.
  • Store SUM(rating_adjustments.rating) as "S" and COUNT(rating_adjustments.rating) as "C". When a row is added to rating_adjustments, add it to S and increment C. Calculate S/C at run-time to get the average. Handle updates and deletes similarly.

这篇关于通过重连接优化MySQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆