来自 Joel Spolsky 文章的 SQL 问题 [英] SQL question from Joel Spolsky article

查看:30
本文介绍了来自 Joel Spolsky 文章的 SQL 问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 Joel Spolsky 关于泄漏抽象的文章:

From Joel Spolsky's article on leaky abstractions:

[C] 某些 SQL 查询比其他逻辑上等效的查询慢数千倍.一个著名的例子是,一些 SQL 服务器在指定where a=b and b=c and a=c"时比仅指定where a=b and b=c"要快得多,即使结果集是一样的.

[C]ertain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same.

有谁知道详情吗?

推荐答案

显然,a = b 和 b = c => a = c - 这与传递闭包有关.Joel 的观点是某些 SQL 服务器在优化查询方面很差,因此某些 SQL 查询可能会像示例中一样使用额外"限定符编写.

Obviously, a = b and b = c => a = c - this is related to transitive closure. The point Joel was making is that some SQL servers are poor at optimizing queries, so some of the SQL queries might be written with "extra" qualifiers as in the example.

在这个例子中,记住上面的a、b和c经常引用不同的表,像a=b这样的操作是作为连接执行的.假设表 a 中的条目数为 1000,b 为 500,c 为 20.然后 a、b 的连接需要 1000x500 行比较(这是我的愚蠢示例;实际上可能有更好的连接算法来降低复杂性很多),并且 b,c 需要 500x20 比较.优化编译器将确定应首先执行 b,c 的连接,然后将结果连接到 a = b,因为 b=c 的预期行较少.(b=c) 和 (a=b) 总共有大约 500x20 + 500x1000 的比较.之后必须在返回的行之间计算交集(我猜也是通过连接,但不确定).

In this example, remember that a, b and c as above often refer to different tables, and operations like a=b are performed as joins. Suppose the number of entries in table a is 1000, b is 500 and c is 20. Then join of a, b needs 1000x500 row comparisons (this is my dumb example; in practice there might be much better join algorithms that would reduce the complexity a lot), and b,c needs 500x20 comparisons. An optimizing compiler will determine that the join of b,c should be performed first and then the result should be joined on a = b since there are fewer expected rows with b=c. In total there are about 500x20 + 500x1000 comparisons for (b=c) and (a=b) respectively. After that intersections have to be computed between the returned rows (I guess also via joins, but not sure).

假设 Sql 服务器可以有一个逻辑推理模块,它也可以推断这意味着 a = c.然后它可能会执行 b,c 的连接,然后是 a,c 的连接(这也是一个假设的情况).这将需要 500x20 + 1000x20 比较,然后进行交集计算.如果预期的 #(a=c) 较小(由于某些领域知识),那么第二个查询会快很多.

Suppose the Sql server could have a logic inference module that would also infer that this means a = c. Then it would probably perform join of b,c and then join of a,c (again this is a hypothetical case). This would take 500x20 + 1000x20 comparisons and after that intersection computations. If expected #(a=c) is lesser (due to some domain knowledge) then the second query will be a lot faster.

总的来说,我的回答变得太长了,但这意味着SQL查询优化不是一项微不足道的任务,这就是为什么某些SQL服务器可能做得不好的原因.

Overall my answer has become too long, but this means that SQL query optimization is not a trivial task, and that is why some SQL servers may not do it very well.

可以在 http://en.wikipedia.org/wiki/Query_optimizer 找到更多信息或者从一些期望读取这个的数据库.

More can be found at http://en.wikipedia.org/wiki/Query_optimizer or from some expect on databases reading this.

但从哲学上讲,SQL(作为一种抽象)旨在隐藏实现的所有方面.它是声明式的(SQL 服务器本身可以使用 sql 查询优化技术来重新表述查询以提高效率).但在现实世界中并非如此 - 通常必须由人工重写数据库查询以提高效率.

But philosophically speaking, SQL (as an abstraction) was meant to hide all aspects of implementation. It was meant to be declarative (a SQL server can itself use sql query optimization techniques to rephrase the query to make them more efficient). But in the real world it is not so - often the database queries have to be rewritten by humans to make them more efficient.

总的来说,文章的重点是抽象只能这么好,没有抽象是完美的.

Overall, the point of the article is that an abstraction can only be so good, and no abstraction is perfect.

这篇关于来自 Joel Spolsky 文章的 SQL 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆