MySQL/PostgreSQL中IN关键字的速度 [英] Speed of IN keyword in MySQL/PostgreSQL

查看:138
本文介绍了MySQL/PostgreSQL中IN关键字的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我听到很多人说大多数关系数据库中的IN关键字很慢.这有多真实?一个示例查询就是这样,

I've heard lots of people saying that the IN keyword in most relational databases is slow. How true is this? An example query would be this, off the top of my head:

SELECT * FROM someTable WHERE someColumn IN (value1, value2, value3)

我听说这比这样做要慢得多.

I've heard that is much slower than doing this:

SELECT * FROM someTable WHERE
  someColumn = value1 OR
  someColumn = value2 OR
  someColumn = value3

这是真的吗?还是速度差可以忽略不计?如果重要的话,我使用的是PostgreSQL,但我也想知道MySQL的价格(以及是否有所不同).预先感谢.

Is this true? Or is the speed difference negligible? If it matters, I'm using PostgreSQL, but I'd also like to know how MySQL fares (and if it's any different). Thanks in advance.

推荐答案

在PostgreSQL中,确切的结果取决于基础表,因此您应该在一些示例查询中对有用的数据子集使用EXPLAIN ANALYZE以确定确切的优化程序将要做什么(确保您正在运行的表也已被分析).可以用几种不同的方式处理IN,这就是为什么您需要查看一些样本以弄清楚哪种替代方法用于数据的原因.您的问题没有简单的通用答案.

In PostgreSQL, exactly what you'll get here depends on the underlying table, so you should use EXPLAIN ANALYZE on some sample queries against a useful subset of your data to figure out exactly what the optimizer is going to do (make sure the tables you're running against have been ANALYZEd too). IN can be processed a couple of different ways, and that's why you need to look at some samples to figure out which alternative is being used for your data. There is no simple generic answer to your question.

对于您在修订中添加的特定问题,针对不涉及索引的琐碎数据集,这是您将获得的两个查询计划的示例:

As for the specific question you added in your revision, against a trivial data set with no indexes involved here's an example of the two query plans you'll get:

postgres=# explain analyze select * from x where s in ('123','456');
 Seq Scan on x  (cost=0.00..84994.69 rows=263271 width=181) (actual time=0.015..1819.702 rows=247823 loops=1)
   Filter: (s = ANY ('{123,456}'::bpchar[]))
 Total runtime: 1931.370 ms

postgres=# explain analyze select * from x where s='123' or s='456';
 Seq Scan on x  (cost=0.00..90163.62 rows=263271 width=181) (actual time=0.014..1835.944 rows=247823 loops=1)
   Filter: ((s = '123'::bpchar) OR (s = '456'::bpchar))
 Total runtime: 1949.478 ms

这两个运行时基本上是相同的,因为真正的处理时间主要取决于对表的顺序扫描.多次运行表明两者之间的差异在运行误差以下.如您所见,PostgreSQL将IN大小写转换为使用其ANY过滤器,该过滤器的执行速度通常应比一系列OR更快.同样,这种琐碎的情况并不一定代表您将在涉及索引等的严肃查询中看到的内容.无论如何,用一系列OR语句手动替换IN永远不会比以前更快,因为如果优化器知道要处理的数据很好,那么优化器就会知道最好的做法.

Those two runtimes are essentially identical, because the real processing time is dominated by the sequential scan across the table; running multiple times shows the difference between the two is below the run to run margin of error. As you can see, PostgreSQL transforms the IN case into using its ANY filter, which should always execute faster than a series of ORs. Again, this trivial case is not necessarily representative of what you'll see on a serious query where indexes and the like are involved. Regardless, manually replacing INs with a series of OR statements should never be faster, because the optimizer knows the best thing to do here if it has good data to work with.

通常,PostgreSQL比MySQL优化器了解更多关于如何优化复杂查询的技巧,但是它也很大程度上取决于您是否给优化器足够的数据来使用. PostgreSQL Wiki的性能优化"部分的第一个链接涵盖了从优化器获得好的结果所需的最重要的事情.

In general, PostgreSQL knows more tricks for how to optimize complicated queries than the MySQL optimizer does, but it also relies heavily on your having given the optimizer enough data to work with. The first links on the "Performance Optimization" section of the PostgreSQL wiki covers the most important things needed to get good results from the optimizer.

这篇关于MySQL/PostgreSQL中IN关键字的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆