大表优化SQL语句 [英] Optimizing SQL statement for a large table

查看:26
本文介绍了大表优化SQL语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下表格的数据库:

I have a database with the following tables:

Courses(course varchar(10));
Prerequisite(course varchar(10), prereq varchar(10));
StudentRecord(student varchar(10), course varchar(10), PRIMARY KEY (student, course));

Courses 包含数据库中的所有课程.先决条件包含给定课程的所有先决条件.并且 StudentRecord 保存所有完成给定课程的学生.

Courses holds all the courses in the database. Prerequisite holds all the prerequisites for a given course. And StudentRecord holds all the students who have completed a given course.

我想出了以下查询,以查找学生可以根据他/她已经完成的先决条件参加的课程:

I came up with the following query to find courses that a student can take based on the prerequisites he/she has already completed:

SELECT DISTINCT s.student, c.course from StudentRecord s, Courses c 
WHERE NOT EXISTS(SELECT * FROM Prerequisite p where p.course = c.course AND
                 p.prereq NOT IN(SELECT course from StudentRecord 
                 WHERE student = s.student)) 
                 AND NOT EXISTS(SELECT * FROM StudentRecord s2 WHERE 
                 s2.student = s.student AND s2.course = c.course);

此查询执行它应该做的事情,并根据完成的先决条件返回学生可以参加的课程列表.但是,当StudentRecord"是一个包含 500 多个条目的大表时,我的查询会窒息并永远运行.是否有运行速度更快的替代查询?我可以优化我当前的查询以运行得更快吗?任何帮助表示赞赏.

This query does what it's supposed to and returns a list of courses the student can take based on the prerequisites completed. However, when "StudentRecord" is a large table with over 500+ entries, my query chokes and takes forever to run. Is there an alternative query that runs faster? Can I optimize my current query to run faster? Any help is appreciated.

推荐答案

如果您不知道要为查询创建哪些索引,找出问题的一种方法是查看数据库认为最好的索引.

If you don't know what indexes to create for a query, one way to find out is to look what the database thinks are the best ones.

>

首先,创建索引列的所有可能组合:

First, create all possible combinations of indexed columns:

CREATE INDEX c_c ON Courses(course);
CREATE INDEX p_c ON Prerequisite(course);
CREATE INDEX p_cp ON Prerequisite(course,prereq);
CREATE INDEX p_p ON Prerequisite(prereq);
CREATE INDEX p_pc ON Prerequisite(prereq,course);
CREATE INDEX sr_s ON StudentRecord(student);
CREATE INDEX sr_sc ON StudentRecord(student,course);
CREATE INDEX sr_c ON StudentRecord(course);
CREATE INDEX sr_cs ON StudentRecord(course,student);

然后查看查询的EXPLAIN QUERY PLAN的输出:

Then look at the output of EXPLAIN QUERY PLAN for the query:

> EXPLAIN QUERY PLAN SELECT DISTINCT s.student.....;
0|0|0|SCAN TABLE StudentRecord AS s
0|1|1|SCAN TABLE Courses AS c
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE Prerequisite AS p USING COVERING INDEX p_cp (course=?)
1|0|0|EXECUTE CORRELATED LIST SUBQUERY 2
2|0|0|SEARCH TABLE StudentRecord USING COVERING INDEX sr_sc (student=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3
3|0|0|SEARCH TABLE StudentRecord AS s2 USING COVERING INDEX sr_cs (course=? AND student=?)
0|0|0|USE TEMP B-TREE FOR DISTINCT

所以你实际上只需要p_cpsr_scsr_cs索引;放下所有其他人.

So you actually need only the p_cp, sr_sc, and sr_cs indexes; drop all the others.

(如果你知道你在做什么,你可以进一步优化.例如,看起来 DISTINCT 和 sr_cs 索引都不是必需的.)

(If you know what you're doing, you could optimizer further. For example, it looks as if neither the DISTINCT nor the sr_cs index are necessary.)

这篇关于大表优化SQL语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆