Postgresql IN运算符性能:列表与子查询 [英] Postgresql IN operator Performance: List vs Subquery
本文介绍了Postgresql IN运算符性能:列表与子查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
对于约700个id的列表,查询性能比传递返回那700个id的子查询慢20倍以上。
例如(第一个查询需要400毫秒以内的时间,之后的9600毫秒是一个时间)
选择date_trunc('month',day)作为月份,sum(总计)
from table_x
其中y_id(从table_y中选择id,其中prop ='xyz')
和'2015-11-05'和'2016-11-04'之间的日期b $ b按月分组
在我的计算机上比直接传递数组快20倍:
选择date_trunc('month',day)作为月份,table_x
中的总和(总)
其中y_id (1625,1871,...,1640,1643,13291,1458,13304,1407,1765)
和'2015-11-05'和'2016-11-04'
组之间的日期按月
任何想法都可能是问题所在,或者如何优化并获得相同的性能?
解决方案
区别是简单的过滤器与哈希联接:
解释分析
选择i。从T $
b $ b其中i的(500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600);
查询计划
---------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -t上的
序列扫描(成本= 0.00..140675.00行= 101宽度= 4)(实际时间= 0.648..1074.567行= 101循环= 1)
过滤器:(i = ANY( '{500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,525,526,527,528,529,530,531,532,533,53 4,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600}'::整数[]))$ b。通过过滤器除去$ B行:999899
计划时间:0.170毫秒
执行时间:1074.624毫秒
解释分析
从t
中选择i
其中i in(从r中选择i);
查询计划
---------------------------------------- -------------------------------------------------- -------------------------
哈希半联接(cost = 3.27..17054.40 rows = 101 width = 4)(实际时间= 0.382..240.389行= 101循环= 1)
哈希值(ti = ri)
->在t上进行Seq扫描(成本= 0.00..14425.00行= 1000000宽度= 4)(实际时间= 0.030.0.1117.193行= 1000000循环= 1)哈希(成本= 2.01..2.01行= 101宽度= 4)(实际时间= 0.074..0.074行= 101循环= 1)
存储桶:1024批处理:1内存使用量:12kB
-> ;在r上进行序列扫描(成本= 0.00..2.01行= 101宽度= 4)(实际时间= 0.010..0.035行= 101循环= 1)
计划时间:0.245 ms
执行时间:240.448 ms
要获得相同的性能,请加入数组:
'pre>
解释分析
选择I $ b。从$ $ b b $ BT
内加入
UNNEST(
排列[500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600] :: int []
)u(i)使用(i)
;
查询计划
---------------------------------------- -------------------------------------------------- -----------------------------
哈希联接(cost = 2.25..18178.25 rows = 100 width = 4)(实际时间= 0.267..243.768行= 101循环= 1)
哈希条件:(ti = ui)
->在t上进行Seq扫描(成本= 0.00..14425.00行= 1000000宽度= 4)(实际时间= 0.022..118.709行= 1000000循环= 1)
->哈希(成本= 1.00..1.00行= 100宽度= 4)(实际时间= 0.063..0.063行= 101循环= 1)
存储桶:1024批次:1内存使用量:12kB
-> ;函数扫描非嵌套u(成本= 0.00..1.00行= 100宽度= 4)(实际时间= 0.028..0.041行= 101循环= 1)
计划时间:0.172 ms
执行时间: 243.816 ms
或使用 values
语法:
解释分析
从t
中选择i
,其中i = any(值(500 ),(501),(502),(503),(504),(505),(506),(507),(508),(509),(510),(511),(512), (513),(514),(515),(516),(517),(518),(519),(520),(521),(522),(523),(524),(525) ),(526),(527),(528),(529),(530),(531),(532),(533),(534),(535),(536),(537), (538),(539),(540),(541),(542),(543),(544),(545),(546),(547),(548),(549),(550) ),(551),(552),(553),(554),(555),(556),(557),(558),(559),(560),(561),(562), (563),(564),(565),(566),(567),(568),(569),(570),(571),(572),(573),(574),(575) ),(576),(577),(578),(579),(580),(581),(582),(583),(584),(585),(586),(587), (588),(589),(590),(591),(592),(593),(594),(595),(596),(597),(598),(599),(600) ))
;
查询计划
---------------------------------------- -------------------------------------------------- -----------------------------
哈希半联接(成本= 2.53..17053.65行= 101宽度= 4) (实际时间= 0.279..239.888行= 101循环= 1)
哈希条件:(ti = * VALUES *。column1)
->在t上进行Seq扫描(成本= 0.00..14425.00行= 1000000宽度= 4)(实际时间= 0.022..117.199行= 1000000循环= 1)
->哈希(成本= 1.26..1.26行= 101宽度= 4)(实际时间= 0.059..0.059行= 101循环= 1)
存储桶:1024批次:1内存使用量:12kB
-> ;在 * VALUES *上扫描值(成本= 0.00..1.26行= 101宽度= 4)(实际时间= 0.002..0.027行= 101循环= 1)
计划时间:0.242 ms
执行时间:239.933毫秒
For a list of ~700 ids the query performance is over 20x slower than passing a subquery that returns those 700 ids. It should be the opposite.
e.g. (first query takes under 400ms, the later 9600 ms)
select date_trunc('month', day) as month, sum(total)
from table_x
where y_id in (select id from table_y where prop = 'xyz')
and day between '2015-11-05' and '2016-11-04'
group by month
is 20x faster on my machine than passing the array directly:
select date_trunc('month', day) as month, sum(total)
from table_x
where y_id in (1625, 1871, ..., 1640, 1643, 13291, 1458, 13304, 1407, 1765)
and day between '2015-11-05' and '2016-11-04'
group by month
Any idea what could be the problem or how to optimize and obtain the same performance?
解决方案
The difference is a simple filter vs a hash join:
explain analyze
select i
from t
where i in (500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..140675.00 rows=101 width=4) (actual time=0.648..1074.567 rows=101 loops=1)
Filter: (i = ANY ('{500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600}'::integer[]))
Rows Removed by Filter: 999899
Planning time: 0.170 ms
Execution time: 1074.624 ms
explain analyze
select i
from t
where i in (select i from r);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=3.27..17054.40 rows=101 width=4) (actual time=0.382..240.389 rows=101 loops=1)
Hash Cond: (t.i = r.i)
-> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..117.193 rows=1000000 loops=1)
-> Hash (cost=2.01..2.01 rows=101 width=4) (actual time=0.074..0.074 rows=101 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on r (cost=0.00..2.01 rows=101 width=4) (actual time=0.010..0.035 rows=101 loops=1)
Planning time: 0.245 ms
Execution time: 240.448 ms
To have the same performance join the array:
explain analyze
select i
from
t
inner join
unnest(
array[500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600]::int[]
) u (i) using (i)
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2.25..18178.25 rows=100 width=4) (actual time=0.267..243.768 rows=101 loops=1)
Hash Cond: (t.i = u.i)
-> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..118.709 rows=1000000 loops=1)
-> Hash (cost=1.00..1.00 rows=100 width=4) (actual time=0.063..0.063 rows=101 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Function Scan on unnest u (cost=0.00..1.00 rows=100 width=4) (actual time=0.028..0.041 rows=101 loops=1)
Planning time: 0.172 ms
Execution time: 243.816 ms
Or use the values
syntax:
explain analyze
select i
from t
where i = any (values (500),(501),(502),(503),(504),(505),(506),(507),(508),(509),(510),(511),(512),(513),(514),(515),(516),(517),(518),(519),(520),(521),(522),(523),(524),(525),(526),(527),(528),(529),(530),(531),(532),(533),(534),(535),(536),(537),(538),(539),(540),(541),(542),(543),(544),(545),(546),(547),(548),(549),(550),(551),(552),(553),(554),(555),(556),(557),(558),(559),(560),(561),(562),(563),(564),(565),(566),(567),(568),(569),(570),(571),(572),(573),(574),(575),(576),(577),(578),(579),(580),(581),(582),(583),(584),(585),(586),(587),(588),(589),(590),(591),(592),(593),(594),(595),(596),(597),(598),(599),(600))
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=2.53..17053.65 rows=101 width=4) (actual time=0.279..239.888 rows=101 loops=1)
Hash Cond: (t.i = "*VALUES*".column1)
-> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..117.199 rows=1000000 loops=1)
-> Hash (cost=1.26..1.26 rows=101 width=4) (actual time=0.059..0.059 rows=101 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Values Scan on "*VALUES*" (cost=0.00..1.26 rows=101 width=4) (actual time=0.002..0.027 rows=101 loops=1)
Planning time: 0.242 ms
Execution time: 239.933 ms
这篇关于Postgresql IN运算符性能:列表与子查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文