改进PostgreSQL查询性能,让数百万数据左连接 [英] Improve PostgreSQL query performance having left join for 100 millions of data
问题描述
我使用 Postgresql-9.2版本
, Windows 7 64位,
RAM 6GB
。这是一个Java企业项目。
我必须在我的页面中显示订单相关信息。有三个表通过左连接结合在一起。
表:
- TV_HD(389772行)
- TV_SNAPSHOT(1564756行)
- TD_MAKKA(419298行)
在加入3个表后,查询给出 487252
。它也会每天增加。
Table Relationship:
- TV_HD与TV_SNAPSHOT包含一对多关系
- 为了更好的理解,我现在给出一个带有sql查询的图片视图。
SELECT * FROM tv_hd where urino = 1630799
SELECT * FROM tv_snapshot where urino = 1630799
SELECT * FROM td_makka where urino = 1630799
这个查询运行大约在90秒内。我如何提高查询性能?
我也想过编制索引。但据我所知,当我们想从表中获得2%-4%的数据时,实际上使用了索引。但在我的情况下,我需要这3个表中的所有数据。
以下是查询:
SELECT count(*)
FROM(SELECT HD.URINO
FROM
TV_HD HD
LEFT JOIN TV_SNAPSHOT T ON(HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1)
LEFT JOIN TV_SNAPSHOT T_SQ
ON(HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3)
LEFT JOIN(SELECT N. URINO
FROM
TD_MAKKA N
WHERE
N.UPDATETIME IN(
SELECT MIN(NMIN.UPDATETIME)
FROM
TD_MAKKA NMIN
其中
N.URINO = NMIN.URINO
和
NMIN.TORIKESHIFLG<> -1
)
)NYUMIN
ON(HD.URINO = NYUMIN.URINO)
LEFT JOIN
(
SELECT
NSUM.URINO ,
SUM(COALESCE(NSUM.NYUKIN,0))NYUKIN,
SUM(COALESCE(NSUM.NYUKIN,0))+ SUM(COALESCE(NSUM.TESU,0))+ SUM(COALESCE NSUM.SOTA,0))SUMNYUKIN
FROM
TD_MAKKA NSUM
GROUP BY
URINO
)NYUSUM
ON(HD.URINO = NYUSUM.URINO)
LEFT JOIN
(
SELECT N.URINO
FROM
TD_MAKKA N
WHERE
UPDATETIME =(
SELECT MAX(UPDATETIME )
FROM
TD_MAKKA NMAX
WHERE
N.URINO = NMAX.URINO
AND
NMAX.TORIKESHIFLG<> ; -1(HD.URIBRUI<> 1)或(HD.URIBRUI<> 1)b $ b)NYUMAX
ON(HD.URINO = NYUMAX.URINO)
WHERE ='1'AND T_SQ.NYUKOBEFLG ='-1'))
ORDER BY
HD.URINO DESC
)COUNT_
EXPLAIN ANALYZE
<$ p $ (实际时间= 69549.159..69549.159行= 1个循环= 1)
- >>总计(成本= 7246861.21..7246861.22行= 1宽度= 0)合并左连接(cost = 7240188.92..7242117.36 rows = 379508 width = 6)(actual time = 68602.689..69510.563 rows = 487252 loops = 1)
合并条件:(hd.urino = n.urino)
- >排序(成本= 3727299.33..3728248.10行= 379508宽度= 6)(实际时间= 62160.072..62557.132行= 420036循环= 1)
排序关键字:hd.urino
排序方法:外部合并磁盘:6984kB
- > Hash Right Join(cost = 169264.26..3686940.26 rows = 379508 width = 6)(actual time = 54796.930..60172.248 rows = 420036 loops = 1)
Hash Cond:(n.urino = hd.urino)
- > Seq Scan on td_makka n(cost = 0.00..3511201.36 rows = 209673 width = 6)(actual time = 24.326..4640.020 rows = 419143 loops = 1)
Filter:(SubPlan 1)
Rows Removed通过筛选:155
SubPlan 1
- >总成本(成本= 8.33..8.34行= 1宽度= 23)(实际时间= 0.009..0.009行= 1个循环= 419298)
- >使用td_makka上的idx_td_makka进行索引扫描nmin(cost = 0.00..8.33 rows = 1 width = 23)(actual time = 0.006..0.007 rows = 1 loops = 419298)
Index Cond:(n.urino = urino)
过滤器:(torikeshiflg<>(-1):: numeric)
被过滤器删除的行:0
- >哈希(成本= 163037.41..163037.41行= 379508宽度= 6)(实际时间= 54771.078..54771.078行= 386428循环= 1)
存储桶:4096批次:16内存使用情况:737kB
- > ;哈希右连接(cost = 75799.55..163037.41 rows = 379508width = 6)(实际时间= 51599.167..54605.901 rows = 386428 loops = 1)
哈希条件:((t_sq.urino = hd.urino)AND (t_sq.tcode = hd.sqcode))
过滤器:((hd.uribrui<> 1':: bpchar)OR((hd.uribrui ='1':: bpchar)AND(t_sq。 nyukobeflg =(-1):: numeric)))
行被Filter删除:3344
- > Seq Scan on tv_snapshot t_sq(cost = 0.00..73705.42 rows = 385577 width = 15)(actual time = 0.053..2002.953 rows = 389983 loops = 1)
Filter((delflg = 0 :: numeric)AND (syubetsu = 3 :: numeric))
过滤器删除的行数:1174773
- >哈希(cost = 68048.99..68048.99 rows = 389771 width = 14)(实际时间= 51596.055..51596.055 rows = 389772 loops = 1)
存储桶:4096批次:16内存使用情况:960kB
- > ;哈希右连接(cost = 21125.85..68048.99 rows = 389771 width = 14)(实际时间= 579.405..51348.270 rows = 389772 loops = 1)
哈希条件:(nyusum.urino = hd.urino)
- >在nyusum上进行子查询扫描(成本= 0.00..35839.52行= 365638宽度= 6)(实际时间= 17.435..49996.674行= 385537循环= 1)
- > GroupAggregate(cost = 0.00..32183.14 rows = 365638 width = 34)(actual time = 17.430..49871.702 rows = 385537 loops = 1)
- >使用td_makka nsum上的idx_td_makka进行索引扫描(cost = 0.00..21456.76 rows = 419345 width = 34)(actual time = 0.017..48357.702 rows = 419298 loops = 1)
- >哈希(成本= 13969.71..13969.71 rows = 389771 width = 20)(实际时间= 491.549..491.549 rows = 389772 loops = 1)
存储桶:4096批次:32内存使用情况:567kB
- > ; Seq Scan on tv_hd hd(cost = 0.00..13969.71 rows = 389771 width = 20)(actual time = 0.052..242.415 rows = 389772 loops = 1)
- >排序(成本= 3512889.60..3512894.84行= 2097宽度= 6)(实际时间= 6442.600..6541.728行= 486359环路= 1)
排序键:n.urino
排序方式:外部分类磁盘:8600kB
- > Seq Scan on td_makka n(cost = 0.00..3512773.90 rows = 2097 width = 6)(actual time = 0.135..4053.116 rows = 419143 loops = 1)
过滤器:((更新时间):: text =(SubPlan 2))
过滤器删除的行数:155
SubPlan 2
- >总成本(成本= 8.33..8.34行= 1宽度= 23)(实际时间= 0.008..0.008行= 1个循环= 419298)
- >使用idx_td_makka在td_makka上进行索引扫描nmax(cost = 0.00..8.33 rows = 1 width = 23)(实际时间= 0.005..0.006 rows = 1 loops = 419298)
Index Cond:(n.urino = urino)
过滤器:(torikeshiflg<>(-1):: numeric)
过滤器删除的行数:0
总运行时间:69575.139 ms
以下是解释分析结果的详细资料:
http://explain.depesz.com/s/23Fg
解决方案第一步:
您可以删除选择查询中不需要的更多列,因为您只需计算总计行数。例如:
select count(*)from(SELECT
HD.URINO
FROM
TV_HD HD
LEFT JOIN TV_SNAPSHOT T ON(HD.URINO = T.URINO和HD.TCODE = T.TCODE AND T.DELFLG = 0且T.SYUBETSU = 1)
LEFT JOIN TV_SNAPSHOT T_SQ ON (HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3)
LEFT JOIN(SELECT
N.URINO
FROM
TD_MAKKA N
WHERE
N.UPDATETIME IN(
SELECT
MIN(NMIN.UPDATETIME)
FROM
TD_MAKKA NMIN
WHERE
N.URINO = NMIN.URINO
AND
NMIN.TORIKESHIFLG<> -1
)
)NYUMIN
ON(HD.URINO = NYUMIN.URINO)
LEFT JOIN
(
SELECT
NSUM.URINO
,SUM(COALESCE(NSUM.NYUKIN,0))NYUKIN
,SUM(COALESCE(NSUM.NYUKIN,0))+ SUM(COALESCE SUMUMYEKIN
FROM
TD_MAKKA NSUM
GROUP BY
URINO
)NYUSUM
ON(HD.URINO = NYUSUM.URINO)
LEFT JOIN
(
SELECT
N.URINO
FROM
TD_MAKKA N
WHERE
UPDATETIME =(
SELECT
MAX(UPDATETIME)
FROM
TD_MAKKA NMAX
WHERE
N.URINO = NMAX.URINO
AND
NMAX.TORIKESHIFLG< -1(HD.URIBRUI<> 1)或(HD.URIBRUI<> 1)b $ b)NYUMAX
ON(HD.URINO = NYUMAX.URINO)
WHERE ='1'AND T_SQ.NYUKOBEFLG ='-1'))
ORDER BY
HD.URINO DESC
)COUNT_
第二步:
您可以避免左连接,这对获取行数没有意义。
举例:
select count(*)from(SELECT
HD.URINO
FROM
TV_HD HD
LEFT JOIN TV_SNAPSHOT T ON(HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1)
LEFT加入TV_SNAPSHOT T_SQ ON(HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3)
LEFT JOIN(SELECT
N.URINO
FROM
TD_MAKKA N
WHERE
N.UPDATETIME IN(
SELECT
MIN(NMIN.UPDATETIME)
FROM
TD_MAKKA NMIN
WHERE
N.URINO = NMIN.URINO
AND
NMIN.TORIKESHIFLG<> -1
)
)NYUMIN
ON( HD.URINO = NYUMIN.URINO)
LEFT JOIN
(
SELECT
N.URINO
FROM
TD_MAKKA N
WHERE
UPDATETIME =(
SELECT
MAX(UPDATETIME)
FROM
TD_MAKKA NMAX
WHERE
N.URINO = NMAX.URINO
AND
NMAX.TORIKESHIFLG< -1(HD.URIBRUI<> 1)或(HD.URIBRUI<> 1)b $ b)NYUMAX
ON(HD.URINO = NYUMAX.URINO)
WHERE ='1'AND T_SQ.NYUKOBEFLG ='-1'))
)COUNT_
$ b $第三步:您可以使用 PgAdmin图形解释计划来分析查询并避免其他不必要的执行开销。I am using
Postgresql-9.2 version
,Windows 7 64 bit
,RAM 6GB
. This is a Java enterprise project.I have to show orders related info in my page. There are three tables which are brought together via a left join.
Tables:
- TV_HD (389772 rows)
- TV_SNAPSHOT (1564756 rows)
- TD_MAKKA (419298 rows)
After left joining 3 tables, the query gives
487252
. It will also increase day by day.Table Relationship:
- TV_HD contains "one to many" relationship with TV_SNAPSHOT
- TV_HD contains "one to many" relationship with TD_MAKKA
For better understanding I am now giving a pictorial view with sql query
SELECT * FROM tv_hd where urino = 1630799
SELECT * FROM tv_snapshot where urino = 1630799
SELECT * FROM td_makka where urino = 1630799 This query runs in approximately 90 seconds. How can I improve my query performance?
I have also thought about indexing. But as far I know, indexing is actually used when we want to get 2%-4% data from table.` But in my case, I need all of data from those 3 tables.
Here is the query:
SELECT count(*) FROM (SELECT HD.URINO FROM TV_HD HD LEFT JOIN TV_SNAPSHOT T ON (HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1) LEFT JOIN TV_SNAPSHOT T_SQ ON (HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3) LEFT JOIN (SELECT N.URINO FROM TD_MAKKA N WHERE N.UPDATETIME IN ( SELECT MIN(NMIN.UPDATETIME) FROM TD_MAKKA NMIN WHERE N.URINO = NMIN.URINO AND NMIN.TORIKESHIFLG <> -1 ) ) NYUMIN ON (HD.URINO = NYUMIN.URINO) LEFT JOIN ( SELECT NSUM.URINO, SUM(COALESCE(NSUM.NYUKIN, 0)) NYUKIN, SUM(COALESCE(NSUM.NYUKIN, 0)) + SUM(COALESCE(NSUM.TESU, 0)) + SUM(COALESCE(NSUM.SOTA, 0)) SUMNYUKIN FROM TD_MAKKA NSUM GROUP BY URINO ) NYUSUM ON (HD.URINO = NYUSUM.URINO) LEFT JOIN ( SELECT N.URINO FROM TD_MAKKA N WHERE UPDATETIME = ( SELECT MAX(UPDATETIME) FROM TD_MAKKA NMAX WHERE N.URINO = NMAX.URINO AND NMAX.TORIKESHIFLG <> -1 ) ) NYUMAX ON (HD.URINO = NYUMAX.URINO) WHERE ((HD.URIBRUI <> '1') OR (HD.URIBRUI = '1' AND T_SQ.NYUKOBEFLG = '-1')) ORDER BY HD.URINO DESC ) COUNT_
Here is the result of
EXPLAIN ANALYZE
Aggregate (cost=7246861.21..7246861.22 rows=1 width=0) (actual time=69549.159..69549.159 rows=1 loops=1) -> Merge Left Join (cost=7240188.92..7242117.36 rows=379508 width=6) (actual time=68602.689..69510.563 rows=487252 loops=1) Merge Cond: (hd.urino = n.urino) -> Sort (cost=3727299.33..3728248.10 rows=379508 width=6) (actual time=62160.072..62557.132 rows=420036 loops=1) Sort Key: hd.urino Sort Method: external merge Disk: 6984kB -> Hash Right Join (cost=169264.26..3686940.26 rows=379508 width=6) (actual time=54796.930..60172.248 rows=420036 loops=1) Hash Cond: (n.urino = hd.urino) -> Seq Scan on td_makka n (cost=0.00..3511201.36 rows=209673 width=6) (actual time=24.326..4640.020 rows=419143 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 155 SubPlan 1 -> Aggregate (cost=8.33..8.34 rows=1 width=23) (actual time=0.009..0.009 rows=1 loops=419298) -> Index Scan using idx_td_makka on td_makka nmin (cost=0.00..8.33 rows=1 width=23) (actual time=0.006..0.007 rows=1 loops=419298) Index Cond: (n.urino = urino) Filter: (torikeshiflg <> (-1)::numeric) Rows Removed by Filter: 0 -> Hash (cost=163037.41..163037.41 rows=379508 width=6) (actual time=54771.078..54771.078 rows=386428 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 737kB -> Hash Right Join (cost=75799.55..163037.41 rows=379508 width=6) (actual time=51599.167..54605.901 rows=386428 loops=1) Hash Cond: ((t_sq.urino = hd.urino) AND (t_sq.tcode = hd.sqcode)) Filter: ((hd.uribrui <> '1'::bpchar) OR ((hd.uribrui = '1'::bpchar) AND (t_sq.nyukobeflg = (-1)::numeric))) Rows Removed by Filter: 3344 -> Seq Scan on tv_snapshot t_sq (cost=0.00..73705.42 rows=385577 width=15) (actual time=0.053..2002.953 rows=389983 loops=1) Filter: ((delflg = 0::numeric) AND (syubetsu = 3::numeric)) Rows Removed by Filter: 1174773 -> Hash (cost=68048.99..68048.99 rows=389771 width=14) (actual time=51596.055..51596.055 rows=389772 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 960kB -> Hash Right Join (cost=21125.85..68048.99 rows=389771 width=14) (actual time=579.405..51348.270 rows=389772 loops=1) Hash Cond: (nyusum.urino = hd.urino) -> Subquery Scan on nyusum (cost=0.00..35839.52 rows=365638 width=6) (actual time=17.435..49996.674 rows=385537 loops=1) -> GroupAggregate (cost=0.00..32183.14 rows=365638 width=34) (actual time=17.430..49871.702 rows=385537 loops=1) -> Index Scan using idx_td_makka on td_makka nsum (cost=0.00..21456.76 rows=419345 width=34) (actual time=0.017..48357.702 rows=419298 loops=1) -> Hash (cost=13969.71..13969.71 rows=389771 width=20) (actual time=491.549..491.549 rows=389772 loops=1) Buckets: 4096 Batches: 32 Memory Usage: 567kB -> Seq Scan on tv_hd hd (cost=0.00..13969.71 rows=389771 width=20) (actual time=0.052..242.415 rows=389772 loops=1) -> Sort (cost=3512889.60..3512894.84 rows=2097 width=6) (actual time=6442.600..6541.728 rows=486359 loops=1) Sort Key: n.urino Sort Method: external sort Disk: 8600kB -> Seq Scan on td_makka n (cost=0.00..3512773.90 rows=2097 width=6) (actual time=0.135..4053.116 rows=419143 loops=1) Filter: ((updatetime)::text = (SubPlan 2)) Rows Removed by Filter: 155 SubPlan 2 -> Aggregate (cost=8.33..8.34 rows=1 width=23) (actual time=0.008..0.008 rows=1 loops=419298) -> Index Scan using idx_td_makka on td_makka nmax (cost=0.00..8.33 rows=1 width=23) (actual time=0.005..0.006 rows=1 loops=419298) Index Cond: (n.urino = urino) Filter: (torikeshiflg <> (-1)::numeric) Rows Removed by Filter: 0 Total runtime: 69575.139 ms
Here is the explain analyze result details:
http://explain.depesz.com/s/23Fg
解决方案First step: You could remove more column which is not required in your select query where as you only need to count the total rows. For Example:
select count(*) from ( SELECT HD.URINO FROM TV_HD HD LEFT JOIN TV_SNAPSHOT T ON (HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1) LEFT JOIN TV_SNAPSHOT T_SQ ON (HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3) LEFT JOIN (SELECT N.URINO FROM TD_MAKKA N WHERE N.UPDATETIME IN ( SELECT MIN (NMIN.UPDATETIME) FROM TD_MAKKA NMIN WHERE N.URINO = NMIN.URINO AND NMIN.TORIKESHIFLG <> -1 ) ) NYUMIN ON (HD.URINO = NYUMIN.URINO) LEFT JOIN ( SELECT NSUM.URINO ,SUM (COALESCE(NSUM.NYUKIN ,0)) NYUKIN ,SUM (COALESCE(NSUM.NYUKIN ,0)) + SUM (COALESCE(NSUM.TESU ,0)) + SUM (COALESCE(NSUM.SOTA ,0)) SUMNYUKIN FROM TD_MAKKA NSUM GROUP BY URINO ) NYUSUM ON (HD.URINO = NYUSUM.URINO) LEFT JOIN ( SELECT N.URINO FROM TD_MAKKA N WHERE UPDATETIME = ( SELECT MAX (UPDATETIME) FROM TD_MAKKA NMAX WHERE N.URINO = NMAX.URINO AND NMAX.TORIKESHIFLG <> -1 ) ) NYUMAX ON (HD.URINO = NYUMAX.URINO) WHERE ( (HD.URIBRUI <> '1') OR ( HD.URIBRUI = '1' AND T_SQ.NYUKOBEFLG = '-1' ) ) ORDER BY HD.URINO DESC ) COUNT_
Second Step: You could avoid left join which does not have significance for getting the row counts. For Example:
select count(*) from ( SELECT HD.URINO FROM TV_HD HD LEFT JOIN TV_SNAPSHOT T ON (HD.URINO = T.URINO AND HD.TCODE = T.TCODE AND T.DELFLG = 0 AND T.SYUBETSU = 1) LEFT JOIN TV_SNAPSHOT T_SQ ON (HD.URINO = T_SQ.URINO AND HD.SQCODE = T_SQ.TCODE AND T_SQ.DELFLG = 0 AND T_SQ.SYUBETSU = 3) LEFT JOIN (SELECT N.URINO FROM TD_MAKKA N WHERE N.UPDATETIME IN ( SELECT MIN (NMIN.UPDATETIME) FROM TD_MAKKA NMIN WHERE N.URINO = NMIN.URINO AND NMIN.TORIKESHIFLG <> -1 ) ) NYUMIN ON (HD.URINO = NYUMIN.URINO) LEFT JOIN ( SELECT N.URINO FROM TD_MAKKA N WHERE UPDATETIME = ( SELECT MAX (UPDATETIME) FROM TD_MAKKA NMAX WHERE N.URINO = NMAX.URINO AND NMAX.TORIKESHIFLG <> -1 ) ) NYUMAX ON (HD.URINO = NYUMAX.URINO) WHERE ( (HD.URIBRUI <> '1') OR ( HD.URIBRUI = '1' AND T_SQ.NYUKOBEFLG = '-1' ) ) ) COUNT_
Third Step: You could use PgAdmin Graphical Explain Plans to analyze the query and avoid others unnecessary execution overhead.
这篇关于改进PostgreSQL查询性能,让数百万数据左连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!