IN(子查询)的性能 [英] performance of IN (subquery)

查看:53
本文介绍了IN(子查询)的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Mac OS X上使用PG 7.4.3。


我对''select foo from
$ b $等查询的表现感到失望b bar其中baz in(子查询)'',或更新如''update bar set foo = 2

where baz in(subquery)''。 PG似乎总是想对条形表进行连续的b $ b扫描。我希望有一种方法告诉PG,在你的计划中使用baz上的

索引,因为我知道子查询将返回

非常少的结果。在真正重要的地方,我一直在构建

动态查询,循环遍历baz的值并为每个值构建一个单独的查询并与UNION(或者只是

直接更新,在更新的情况下)。根据

吧台的大小,我可以获得数百甚至超过一千美元的加速,但是要做到这一点是一件很大的痛苦。


任何提示?


谢谢,

Kevin Murphy


插图:


我想做的查询非常慢:


从build.elements中选择bundle_id

where elementid in(

SELECT superlocs_2.element_id

FROM superlocs_2 NATURAL JOIN bundle_superlocs_2

WHERE bundle_superlocs_2.protobundle_id = 1);

-----------

7644

7644

(2行)

时间:518.242毫秒

子查询很快:


SELECT superlocs_2.element_id

FROM superlocs_2 NATURAL JOIN bundle_superlocs_2

WHERE bundle_superlocs_2.protobundle_id = 1;

------------

41209

25047

(2行)

时间:3.268毫秒

我们主表上的索引很快:


从build.elements中选择bundle_id

其中elementid in(41209,25047);

-----------

7644

7644

(2行)

时间:2.468毫秒

缓慢查询的计划:


egenome_test =#解析分析从build.elements中选择bundle_id
where elementid in(

SELECT superlocs_2.element_id

FROM superlocs_2 NATURAL JOIN bundle_superlocs_2

WHERE bundle_superlocs_2.protobundle_id = 1);

egenome_test-#egenome_test(#egenome_test(#egenome_test(#

QUERY PLAN

\


------------------------------------------------ ------------------------

------------------ -------------------------------------------

散列加入(成本= 70.33..72.86行= 25宽度= 4)(实际

时间= 583.051..583.059行= 2循环= 1)

哈希条件:(外部.eleme nt_id =" inner" .elementid)

- > HashAggregate(成本= 47.83..47.83行= 25宽度= 4)(实际

时间= 0.656..0.658行= 2个循环= 1)

- >散列连接(成本= 22.51..47.76行= 25宽度= 4)(实际

时间= 0.615..0.625行= 2循环= 1)

哈希条件: (" outer" .superloc_id =" inner" .superloc_id)

- >在superlocs_2上进行Seq扫描(成本= 0.00..20.00行= 1000

宽度= 8)(实际时间= 0.004..0.012行= 9次循环= 1)

- >散列(成本= 22.50..22.50行= 5宽度= 4)(实际

时间= 0.076..0.076行= 0循环= 1)

- >在bundle_superlocs_2上进行Seq扫描

(成本= 0.00..22.50行= 5宽度= 4)(实际时间= 0.024..0.033行= 2

循环= 1)

过滤器:(protobundle_id = 1)

- >哈希(成本= 20.00..20.00行= 1000宽度= 8)(实际

时间= 581.802..581.802行= 0循环= 1)

- > Seq扫描元素(成本= 0.00..20.00行= 1000宽度= 8)

(实际时间= 0.172..405.243行= 185535循环= 1)

总计运行时间:593.843 ms

(12行)

-------------------------- - (广播结束)---------------------------

提示6:您是否搜索过我们的列表档案?

http://archives.postgresql.org

我是在Mac OS X上使用PG 7.4.3。

我对查询的表现感到失望,例如'从酒吧选择foo
baz in(subquery)'',或更新如' '更新栏设置foo = 2其中baz
in(子查询)''。 PG似乎总是想要对酒吧表进行顺序扫描。我希望有一种方法告诉PG,在你的
计划中使用baz上的索引,因为我知道子查询将返回非常少的结果。
真的很重要,我一直在构建动态查询,通过循环遍历baz的值并为每个值构建一个单独的查询并将
与UNION组合(或者只是直接更新,在更新案例)。根据酒吧桌的大小,我可以获得数百甚至超过千倍的加速,但是要做到这一点是一件很大的痛苦。
任何提示?

谢谢,
Kevin Murphy

插图:

我想做的查询非常慢:

从build.elements中选择bundle_id
其中elementid in(
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
7644
7644
(2行)
时间:518.242毫秒




什么字段类型是protobundle_id?如果你将''''转换为

相同,索引是否会被使用?


电子邮件: sc ***** @ hub.org Yahoo!:yscrappy ICQ:7615664


------ ---------------------(广播结束)------------------------ ---

提示3:如果通过Usenet发布/阅读,请发送适当的

subscribe-nomail命令给 ma ******* @ postgresql.org ,以便您的

消息可以干净地通过邮件列表


Kevin Murphy写道:

-------------------- -------------------------------------------------- -
-------------------------------------------- -----------------
Hash Join(成本= 70.33..72.86行= 25宽= 4)(实际
时间= 583.051..583.059行= 2循环= 1)
哈希条件:(外部.element_id ="内部.elementid)
- > HashAggregate(成本= 47.83..47.83行= 25宽度= 4)(实际
时间= 0.656..0.658行= 2个循环= 1)
- >散列连接(成本= 22.51..47.76行= 25宽度= 4)(实际
时间= 0.615..0.625行= 2循环= 1)
散列条件:(外部.superloc_id = inner.superloc_id)
- >在superlocs_2上进行Seq扫描(成本= 0.00..20.00行= 1000宽度= 8)
(实际时间= 0.004..0.012行= 9个循环= 1)
- >散列(成本= 22.50..22.50行= 5宽度= 4)(实际时间= 0.076..0.076
行= 0循环= 1)
- >在bundle_superlocs_2上进行Seq Scan(成本= 0.00..22.50行= 5宽度= 4)
(实际时间= 0.024..0.033行= 2个循环= 1)
过滤器:(protobundle_id = 1)
- >散列(成本= 20.00..20.00行= 1000宽度= 8)(实际
时间= 581.802..581.802行= 0循环= 1)
- > Seq扫描元素(成本= 0.00..20.00行= 1000宽度= 8)(实际
时间= 0.172..405.243行= 185535循环= 1)


planner认为对元素的顺序扫描将返回1000

行,但它实际上返回185000.你最近是否对这个表进行了分析?


事后补充:它会如果数据库足够聪明,那就好了。当顺序扫描返回的结果比说应该的20倍时,
分析一张自己的表。


Paul

总运行时间:593.843 ms
(12行)

-------- -------------------(广播结束)-------------------------- - 提示6:您是否搜索了我们的列表档案?

http ://archives.postgresql.org



---------------------- -----(播出结束)---------------------------

提示7:唐'别忘了增加你的f ree空间地图设置


>事后想想:如果数据库足够智能,那么

在顺序扫描返回的时候比自己应该的20倍更多地分析自己的表格会更好。




我曾经多次想知道PG是否有任何理由不能自动执行与seq同时进行分析的
扫描,因为它发生了b $ b。这样,不需要额外的磁盘IO,并且统计数据可以说几乎是免费的




任何黑客都可以说明原因这可能是一个坏主意,或者只是需要志愿者的那些东西?b
? (我不是;至少现在不行。)


------------------------- - (播出结束)---------------------------

提示8:解释分析是你的朋友


I''m using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like ''select foo from
bar where baz in (subquery)'', or updates like ''update bar set foo = 2
where baz in (subquery)''. PG always seems to want to do a sequential
scan of the bar table. I wish there were a way of telling PG, "use the
index on baz in your plan, because I know that the subquery will return
very few results". Where it really matters, I have been constructing
dynamic queries by looping over the values for baz and building a
separate query for each one and combining with a UNION (or just
directly updating, in the update case). Depending on the size of the
bar table, I can get speedups of hundreds or even more than a thousand
times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
7644
7644
(2 rows)
Time: 518.242 ms
The subquery is fast:

SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1;
------------
41209
25047
(2 rows)
Time: 3.268 ms
And using indexes on the main table is fast:

select bundle_id from build.elements
where elementid in (41209, 25047);
-----------
7644
7644
(2 rows)
Time: 2.468 ms

The plan for the slow query:

egenome_test=# explain analyze select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
egenome_test-# egenome_test(# egenome_test(# egenome_test(#
QUERY PLAN
\

------------------------------------------------------------------------
-------------------------------------------------------------
Hash Join (cost=70.33..72.86 rows=25 width=4) (actual
time=583.051..583.059 rows=2 loops=1)
Hash Cond: ("outer".element_id = "inner".elementid)
-> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual
time=0.656..0.658 rows=2 loops=1)
-> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual
time=0.615..0.625 rows=2 loops=1)
Hash Cond: ("outer".superloc_id = "inner".superloc_id)
-> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000
width=8) (actual time=0.004..0.012 rows=9 loops=1)
-> Hash (cost=22.50..22.50 rows=5 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
-> Seq Scan on bundle_superlocs_2
(cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2
loops=1)
Filter: (protobundle_id = 1)
-> Hash (cost=20.00..20.00 rows=1000 width=8) (actual
time=581.802..581.802 rows=0 loops=1)
-> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8)
(actual time=0.172..405.243 rows=185535 loops=1)
Total runtime: 593.843 ms
(12 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

解决方案

On Thu, 26 Aug 2004, Kevin Murphy wrote:

I''m using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like ''select foo from bar
where baz in (subquery)'', or updates like ''update bar set foo = 2 where baz
in (subquery)''. PG always seems to want to do a sequential scan of the bar
table. I wish there were a way of telling PG, "use the index on baz in your
plan, because I know that the subquery will return very few results". Where
it really matters, I have been constructing dynamic queries by looping over
the values for baz and building a separate query for each one and combining
with a UNION (or just directly updating, in the update case). Depending on
the size of the bar table, I can get speedups of hundreds or even more than a
thousand times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
7644
7644
(2 rows)
Time: 518.242 ms



what field type is protobundle_id? if you typecast the ''1'' to be the
same, does the index get used?

Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly


Kevin Murphy wrote:

------------------------------------------------------------------------
-------------------------------------------------------------
Hash Join (cost=70.33..72.86 rows=25 width=4) (actual
time=583.051..583.059 rows=2 loops=1)
Hash Cond: ("outer".element_id = "inner".elementid)
-> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual
time=0.656..0.658 rows=2 loops=1)
-> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual
time=0.615..0.625 rows=2 loops=1)
Hash Cond: ("outer".superloc_id = "inner".superloc_id)
-> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000 width=8)
(actual time=0.004..0.012 rows=9 loops=1)
-> Hash (cost=22.50..22.50 rows=5 width=4) (actual time=0.076..0.076
rows=0 loops=1)
-> Seq Scan on bundle_superlocs_2 (cost=0.00..22.50 rows=5 width=4)
(actual time=0.024..0.033 rows=2 loops=1)
Filter: (protobundle_id = 1)
-> Hash (cost=20.00..20.00 rows=1000 width=8) (actual
time=581.802..581.802 rows=0 loops=1)
-> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8) (actual
time=0.172..405.243 rows=185535 loops=1)
The planner thinks that the sequential scan on elements will return 1000
rows, but it actually returned 185000. Did you ANALYZE this table recently?

Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.

Paul
Total runtime: 593.843 ms
(12 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 7: don''t forget to increase your free space map settings


> Afterthought: It would be nice if the database was smart enough to

analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.



I''ve wondered on several occasions if there is any good reason for PG not
to automatically perform an analyze concurrently with a seq scan as it''s
happening. That way, no extra disk IO is needed and the stats could say
up-to-date for almost free.

Any hackers around who can say why this might be a bad idea, or is it one
of those things that just needs a volunteer? (I''m not; at least not now.)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


这篇关于IN(子查询)的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆