如何为MASSIVE MySQL表上的计算列优化ORDER BY [英] How to optimize an ORDER BY for a computed column on a MASSIVE MySQL table

查看:68
本文介绍了如何为MASSIVE MySQL表上的计算列优化ORDER BY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的行(80+百万行)非规范化的MySQL表.简化的架构如下:

I have a very large (80+ million row) de-normalized MySQL table. A simplified schema looks like:


+-----------+-------------+--------------+--------------+
|    ID     |   PARAM1    |   PARAM2     |   PARAM3     |
+-----------+-------------+--------------+--------------+
|    1      |   .04       |    .87       |    .78       |
+-----------+-------------+--------------+--------------+
|    2      |   .12       |    .02       |    .76       |
+-----------+-------------+--------------+--------------+
|    3      |   .24       |    .92       |    .23       |
+-----------+-------------+--------------+--------------+
|    4      |   .65       |    .12       |    .01       |
+-----------+-------------+--------------+--------------+
|    5      |   .98       |    .45       |    .65       |
+-----------+-------------+--------------+--------------+

我试图查看是否有一种优化查询的方法,在该方法中,我对每个PARAM列(权重在0到1之间)应用权重,然后对其求平均值以得出计算值SCORE.然后我想对计算出的SCORE列进行ORDER BY.

I'm trying to see if there's a way to optimize a query in which I apply a weight to each PARAM column (where weight is between 0 and 1) and then average them to come up with a computed value SCORE. Then I want to ORDER BY that computed SCORE column.

例如,假设PARAM1的权重为.5,PARAM2的权重为.23,PARAM3的权重为.76,您最终将得到类似于:

For example, assuming the weighting for PARAM1 is .5, the weighting for PARAM2 is .23 and the weighting for PARAM3 is .76, you would end up with something similar to:

SELECT ID, ((PARAM1 * .5) + (PARAM2 * .23) + (PARAM3 * .76)) / 3 AS SCORE 

ORDER BY SCORE DESC LIMIT 10

有了一些适当的索引,对于基本查询来说这是快速的,但是我想不出一种在如此大的表上加快上述查询速度的好方法.

With some proper indexing, this is fast for basic queries, but I can't figure out a good way to speed up the above query on such a large table.

详细信息:

  • 每个PARAM值介于0和1之间
  • 每个施加到PARAMS的重量在0到1 s之间

-编辑-

此问题的简化版本.

这会在合理的时间内运行:

This runs in a reasonable amount of time:

SELECT value1, value2 
FROM sometable 
WHERE id = 1 
ORDER BY value2

这不会在合理的时间内运行:

This does not run in a reasonable amount of time:

 SELECT value1, (value2 * an_arbitrary_float) as value3 
 FROM sometable 
 WHERE id = 1 
 ORDER BY value3

使用上面的示例,是否有任何解决方案可以让我提前执行ORDER BY,而无需计算value3?

Using the above example, is there any solution that allows me to do an ORDER BY with out computing value3 ahead of time?

推荐答案

我发现了2种(显而易见的)东西,可以将查询速度提高到令人满意的水平:

I've found 2 (sort of obvious) things that have helped speed this query up to a satisfactory level:

  1. 最小化需要排序的行数.通过使用"id"字段上的索引和子选择首先修剪记录数,在计算列上的文件排序还不错.即:

  1. Minimize the number of rows that need to be sorted. By using an index on the 'id' field and a subselect to trim the number of records first, the file sort on the computed column is not that bad. Ie:

SELECT t.value1, (t.value2 * an_arbitrary_float) as SCORE
FROM (SELECT * FROM sometable WHERE id = 1) AS t 
ORDER BY SCORE DESC

  • 尝试增加 sort_buffer_size 在my.conf中,以加快这些文件排序的速度.

  • Try increasing sort_buffer_size in my.conf to speed up those filesorts.

    这篇关于如何为MASSIVE MySQL表上的计算列优化ORDER BY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆