具有大数据集的自联接超时 [英] Self-Join Timeout With Large DataSets

查看:49
本文介绍了具有大数据集的自联接超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用MySQL 5.1,并且尝试使用具有匹配标识符的每行的最后两个值之间的差.示例:

I''m using MySQL 5.1 and I''m trying to take the difference between the last two values of every row with matching identifiers. Example:

CREATE TABLE tbl1(id INT AUTO_INCREMENT PRIMARY KEY,team VARCHAR(10) INDEX,score INT);



DataSet
id team 得分
1 xyz 10
2 abc 15
3 abc 22
4 rst 98
5 xyz 67
6 xyz 48
7 rst 92


查询:



DataSet
idteamscore
1xyz10
2abc15
3abc22
4rst98
5xyz67
6xyz48
7rst92


Query:

SELECT
    x.id,
    x.team,
    x.score - y.score score_diff
FROM
    (SELECT a.*, COUNT(*) rank
     FROM tbl1 a
     LEFT JOIN tbl1 b
     ON b.team = a.team
     AND b.id >= a.id
     GROUP BY id
     ORDER BY id DESC) x
JOIN
    (SELECT a.*, COUNT(*) rank
     FROM tbl1 a
     LEFT JOIN tbl1 b
     ON b.team = a.team
     AND b.id >= a.id
     GROUP BY id
     ORDER BY id DESC)y
ON
    y.team = x.team
    AND y.rank = x.rank + 1
GROUP BY
    symbol
ORDER BY
    id DESC;


这段代码可以在一个小的dataSet上正常运行,但是当我使用包含数千行的dataSet时,它就会死掉!任何人都可以向我展示编写此查询的正确方法或更有效的方法吗?

任何帮助/建议将不胜感激!

谢谢,

-Donald


This code runs just fine with a small dataSet but when I use with a dataSet containing thousands of rows it dies! Can anyone show me the correct way or a more efficient way to write this query?

ANY help/advice would be GREATLY appreciated!

Thanks,

-Donald

推荐答案

您可以通过在表上适当地建立索引来极大地加快查询速度.看看.

一个简单的
You may be able to speed up the query immensely by implementing indexing on your table appropriately. Look that this.

A simple google search on indexing also turns up lots of references for this. Cheers.


没有分析等,这是简单的方法.

创建一个存储过程来完成所有工作.它将包含多个语句.

创建一个临时表(请参见
此处)来保存结果.称其为"resultsTable".

创建另一个临时表以保存一些临时结果.称它为"tempTable".

创建一个游标(请参见此处),以要循环的顺序选择每条记录通过他们(例如,按团队排序).

使用光标在每个记录之间循环.将每个记录插入tempTable.将先前的团队存储在一个变量中,当您遇到一个新团队(或记录的末尾)时,请检查tempTable中的最后两个"值,然后将结果插入resultsTable中.清除tempTable.

遍历每条记录后,就可以完成.

其中一些可以优化,但是就像我说的那样,这是简单的方法,因此对您而言,最简单的方法就是这样做.
Without profiling and such, here is the easy route.

Create a stored procedure to do all the work. It will contain more than one statement.

Create a temporary table (see here) to hold your results. Call this, say, "resultsTable".

Create another temporary table to hold some temporary results. Call this, say "tempTable".

Create a cursor (see here) that selects each record in the order you want to loop through them (e.g., sort by team).

Use the cursor to loop through each record. Insert each record into tempTable. Store the previous team in a variable and when you encounter a new team (or the end of the records), check tempTable for the "last two" values and insert the result into resultsTable. Clear the tempTable.

Once you have looped through each record, you are done.

Some of that can be optimized, but like I said it''s the easy route, so it may be easiest for you to just do this.


这篇关于具有大数据集的自联接超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆