如何为MASSIVE MySQL表上的计算列优化ORDER BY [英] How to optimize an ORDER BY for a computed column on a MASSIVE MySQL table
问题描述
我有一个很大的行(80+百万行)非规范化的MySQL表.简化的架构如下:
I have a very large (80+ million row) de-normalized MySQL table. A simplified schema looks like:
+-----------+-------------+--------------+--------------+
| ID | PARAM1 | PARAM2 | PARAM3 |
+-----------+-------------+--------------+--------------+
| 1 | .04 | .87 | .78 |
+-----------+-------------+--------------+--------------+
| 2 | .12 | .02 | .76 |
+-----------+-------------+--------------+--------------+
| 3 | .24 | .92 | .23 |
+-----------+-------------+--------------+--------------+
| 4 | .65 | .12 | .01 |
+-----------+-------------+--------------+--------------+
| 5 | .98 | .45 | .65 |
+-----------+-------------+--------------+--------------+
我试图查看是否有一种优化查询的方法,在该方法中,我对每个PARAM列(权重在0到1之间)应用权重,然后对其求平均值以得出计算值SCORE.然后我想对计算出的SCORE列进行ORDER BY.
I'm trying to see if there's a way to optimize a query in which I apply a weight to each PARAM column (where weight is between 0 and 1) and then average them to come up with a computed value SCORE. Then I want to ORDER BY that computed SCORE column.
例如,假设PARAM1的权重为.5,PARAM2的权重为.23,PARAM3的权重为.76,您最终将得到类似于:
For example, assuming the weighting for PARAM1 is .5, the weighting for PARAM2 is .23 and the weighting for PARAM3 is .76, you would end up with something similar to:
SELECT ID, ((PARAM1 * .5) + (PARAM2 * .23) + (PARAM3 * .76)) / 3 AS SCORE
ORDER BY SCORE DESC LIMIT 10
有了一些适当的索引,对于基本查询来说这是快速的,但是我想不出一种在如此大的表上加快上述查询速度的好方法.
With some proper indexing, this is fast for basic queries, but I can't figure out a good way to speed up the above query on such a large table.
详细信息:
- 每个PARAM值介于0和1之间
- 每个施加到PARAMS的重量在0到1 s之间
-编辑-
此问题的简化版本.
这会在合理的时间内运行:
This runs in a reasonable amount of time:
SELECT value1, value2
FROM sometable
WHERE id = 1
ORDER BY value2
这不会在合理的时间内运行:
This does not run in a reasonable amount of time:
SELECT value1, (value2 * an_arbitrary_float) as value3
FROM sometable
WHERE id = 1
ORDER BY value3
使用上面的示例,是否有任何解决方案可以让我提前执行ORDER BY,而无需计算value3?
Using the above example, is there any solution that allows me to do an ORDER BY with out computing value3 ahead of time?
推荐答案
我发现了2种(显而易见的)东西,可以将查询速度提高到令人满意的水平:
I've found 2 (sort of obvious) things that have helped speed this query up to a satisfactory level:
-
最小化需要排序的行数.通过使用"id"字段上的索引和子选择首先修剪记录数,在计算列上的文件排序还不错.即:
Minimize the number of rows that need to be sorted. By using an index on the 'id' field and a subselect to trim the number of records first, the file sort on the computed column is not that bad. Ie:
SELECT t.value1, (t.value2 * an_arbitrary_float) as SCORE
FROM (SELECT * FROM sometable WHERE id = 1) AS t
ORDER BY SCORE DESC
尝试增加 sort_buffer_size 在my.conf中,以加快这些文件排序的速度.
Try increasing sort_buffer_size in my.conf to speed up those filesorts.
这篇关于如何为MASSIVE MySQL表上的计算列优化ORDER BY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!