大表优化SQL语句 [英] Optimizing SQL statement for a large table
问题描述
我有一个包含以下表格的数据库:
I have a database with the following tables:
Courses(course varchar(10));
Prerequisite(course varchar(10), prereq varchar(10));
StudentRecord(student varchar(10), course varchar(10), PRIMARY KEY (student, course));
Courses 包含数据库中的所有课程.先决条件包含给定课程的所有先决条件.并且 StudentRecord 保存所有完成给定课程的学生.
Courses holds all the courses in the database. Prerequisite holds all the prerequisites for a given course. And StudentRecord holds all the students who have completed a given course.
我想出了以下查询,以查找学生可以根据他/她已经完成的先决条件参加的课程:
I came up with the following query to find courses that a student can take based on the prerequisites he/she has already completed:
SELECT DISTINCT s.student, c.course from StudentRecord s, Courses c
WHERE NOT EXISTS(SELECT * FROM Prerequisite p where p.course = c.course AND
p.prereq NOT IN(SELECT course from StudentRecord
WHERE student = s.student))
AND NOT EXISTS(SELECT * FROM StudentRecord s2 WHERE
s2.student = s.student AND s2.course = c.course);
此查询执行它应该做的事情,并根据完成的先决条件返回学生可以参加的课程列表.但是,当StudentRecord"是一个包含 500 多个条目的大表时,我的查询会窒息并永远运行.是否有运行速度更快的替代查询?我可以优化我当前的查询以运行得更快吗?任何帮助表示赞赏.
This query does what it's supposed to and returns a list of courses the student can take based on the prerequisites completed. However, when "StudentRecord" is a large table with over 500+ entries, my query chokes and takes forever to run. Is there an alternative query that runs faster? Can I optimize my current query to run faster? Any help is appreciated.
推荐答案
如果您不知道要为查询创建哪些索引,找出问题的一种方法是查看数据库认为最好的索引.
If you don't know what indexes to create for a query, one way to find out is to look what the database thinks are the best ones.
>
首先,创建索引列的所有可能组合:
First, create all possible combinations of indexed columns:
CREATE INDEX c_c ON Courses(course);
CREATE INDEX p_c ON Prerequisite(course);
CREATE INDEX p_cp ON Prerequisite(course,prereq);
CREATE INDEX p_p ON Prerequisite(prereq);
CREATE INDEX p_pc ON Prerequisite(prereq,course);
CREATE INDEX sr_s ON StudentRecord(student);
CREATE INDEX sr_sc ON StudentRecord(student,course);
CREATE INDEX sr_c ON StudentRecord(course);
CREATE INDEX sr_cs ON StudentRecord(course,student);
然后查看查询的EXPLAIN QUERY PLAN的输出:
Then look at the output of EXPLAIN QUERY PLAN for the query:
> EXPLAIN QUERY PLAN SELECT DISTINCT s.student.....;
0|0|0|SCAN TABLE StudentRecord AS s
0|1|1|SCAN TABLE Courses AS c
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE Prerequisite AS p USING COVERING INDEX p_cp (course=?)
1|0|0|EXECUTE CORRELATED LIST SUBQUERY 2
2|0|0|SEARCH TABLE StudentRecord USING COVERING INDEX sr_sc (student=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3
3|0|0|SEARCH TABLE StudentRecord AS s2 USING COVERING INDEX sr_cs (course=? AND student=?)
0|0|0|USE TEMP B-TREE FOR DISTINCT
所以你实际上只需要p_cp
、sr_sc
和sr_cs
索引;放下所有其他人.
So you actually need only the p_cp
, sr_sc
, and sr_cs
indexes; drop all the others.
(如果你知道你在做什么,你可以进一步优化.例如,看起来 DISTINCT 和 sr_cs
索引都不是必需的.)
(If you know what you're doing, you could optimizer further. For example, it looks as if neither the DISTINCT nor the sr_cs
index are necessary.)
这篇关于大表优化SQL语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!