具有100k条记录的SQL Server表,2个内部联接非常慢 [英] SQL Server table with 100k records, 2 Inner Joins Extremely Slow

查看:111
本文介绍了具有100k条记录的SQL Server表,2个内部联接非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将数据从SQL Server迁移到Postgres.

I'm migrating my data from SQL Server to Postgres.

我正在更改表格结构以处理通用体育比赛,但这给我带来了性能问题.

I'm changing my table structure to handle generic sports matches, but it is giving me performance problems.

我有下表:

  • 比赛(id,开始时间)
  • 比赛团队(id,match_id,team_id,得分)
  • match_players (id,lineup_id,player_id),其中lineup_id是match_teams.id上的外键
  • matches (id, start_time)
  • match_teams (id, match_id, team_id, score)
  • match_players (id, lineup_id, player_id), where lineup_id is a foreign key on match_teams.id

我正在使用以下查询选择所有匹配项:

I'm selecting all matches with the following query:

SELECT * FROM matches AS m
INNER JOIN match_teams AS t ON m.id = t.match_id
INNER JOIN match_players AS p ON t.id = p.lineup_id

有10万条记录,此查询大约需要6分钟:

With 100k records, this query takes around 6 minutes:

-- Executing query:
SELECT * FROM matches AS m
INNER JOIN match_teams AS t ON m.id = t.match_id
INNER JOIN match_players AS p ON t.id = p.lineup_id
Total query runtime: 336360 ms.
1142078 rows retrieved.

在SQL Server上,我将所有这些数据存储在一个表中,它将在不到5秒的时间内返回.在Postgres中,我还使用jsonb将这些数据放入1个表中,并且能够在40秒内运行上述查询.

On SQL Server, I had all of this data in one table and it would return in less than 5 seconds. In Postgres, I have also put this data into 1 table with jsonb, and was able to run the above query in 40 seconds.

如何使此查询更快?我想把它缩短到几秒钟.

How can I make this query faster? I would like to get it down to seconds.

在线阅读我发现创建索引可以加快这些连接的速度.我已经建立了以下索引:

Reading online I have found that creating indexes can speed up these joins. I've made the following indexes:

CREATE INDEX match_teams_match_id_idx ON match_teams USING btree (match_id);
CREATE INDEX match_players_lineup_id_idx ON match_players USING btree (lineup_id);
CREATE INDEX match_players_player_id_idx ON match_players USING btree (player_id);
CREATE INDEX matches_id_idx ON matches USING btree (id);

这些索引根本没有使查询更快.我想念一个吗?

These indexes haven't made the query faster at all. Am I missing one?

以下是上述查询的EXPLAIN ANALYZE VERBOSE输出:

Here's the EXPLAIN ANALYSE VERBOSE output of the above query:

"Hash Join  (cost=19314.10..67893.04 rows=1135917 width=24) (actual time=401.225..1624.906 rows=1142078 loops=1)"
"  Output: m.id, m.start_time, t.team_id, t.rank, p.player_id"
"  Hash Cond: (p.lineup_id = t.id)"
"  ->  Seq Scan on public.match_players p  (cost=0.00..19818.78 rows=1142078 width=8) (actual time=0.039..356.168 rows=1142078 loops=1)"
"        Output: p.player_id, p.lineup_id"
"  ->  Hash  (cost=15119.58..15119.58 rows=228442 width=24) (actual time=401.123..401.123 rows=228442 loops=1)"
"        Output: m.id, m.start_time, t.team_id, t.rank, t.id"
"        Buckets: 8192  Batches: 4  Memory Usage: 3358kB"
"        ->  Hash Join  (cost=5097.97..15119.58 rows=228442 width=24) (actual time=74.766..310.864 rows=228442 loops=1)"
"              Output: m.id, m.start_time, t.team_id, t.rank, t.id"
"              Hash Cond: (t.match_id = m.id)"
"              ->  Seq Scan on public.match_teams t  (cost=0.00..3519.42 rows=228442 width=16) (actual time=0.004..64.580 rows=228442 loops=1)"
"                    Output: t.team_id, t.rank, t.match_id, t.id"
"              ->  Hash  (cost=3112.21..3112.21 rows=114221 width=12) (actual time=74.728..74.728 rows=114221 loops=1)"
"                    Output: m.id, m.start_time"
"                    Buckets: 16384  Batches: 2  Memory Usage: 2682kB"
"                    ->  Seq Scan on public.matches m  (cost=0.00..3112.21 rows=114221 width=12) (actual time=0.003..34.789 rows=114221 loops=1)"
"                          Output: m.id, m.start_time"
"Planning time: 0.448 ms"
"Execution time: 1799.412 ms"

更新

在此处添加了DDL: http://pastie.org/10529040

Added DDL Here: http://pastie.org/10529040

更新2

Postgres在AWS RDS服务器上运行.我尝试在干净的EC2服务器和干净的PGAdmin安装上运行上述查询.我得到了相同的结果,似乎在约2秒内运行了查询,但要花费约6分钟才能显示数据.

Postgres is running on an AWS RDS Server. I tried running the above query on a clean EC2 server and a clean PGAdmin install. I got the same results, appears to run query in ~2sec but takes ~6min to display the data.

更新3

我尝试从一个简单的C#程序运行此查询,结果在10秒钟左右返回.这似乎是PGAdmin的问题.

I tried running this query from a simple C# program and the results were returned in around 10 seconds. This appears to be an issue with PGAdmin.

推荐答案

Postgres具有非常智能的查询引擎.我使用Postgres,经常从别人那里听到"Postgres速度很慢"的信息-但是我从未经历过.也许它没有其他DBMS可能具有的默认值,因此您只需要了解优化即可.

Postgres has a very smart query engine. I use Postgres and often I have heard from others how "Postgres is slow" - however I have never experienced this. It just maybe doesn't have the defaults that other DBMSs may have and so you just need to understand optimization.

耦合稳定点:

  • 所有表均应具有主键,并且必须具有CONSTRAINT作为主键
  • 如果您在上面放置了btree索引(如上所述),那么要进行联接,ORDER BYing,GROUP BYing的任何大型表都会进行优化

因为引擎本身会思考",所以一旦将索引放入表中,您通常就需要告诉Postgres重新分析表".另外,有时您需要清空"它,以清除Postgres认为具有的死"行.

Because the engine "thinks" for itself, you often need to tell Postgres to "reanalyze the table" once you put an index onto it. Also, you sometimes need to "vaccum" it to get any "dead" rows out of it that Postgres thinks it has.

为此,请执行以下操作:

To do this, do:

vacuum [schema].[table_name];
analyze [schema].[table_name];

在将索引放在上面之后执行此操作,这将大大加快查询的执行速度.

Do this AFTER you have put your indexes on it, and it should greatly speed up your query execution.

注意:您不需要在主键上放置索引,因为主键约束会创建一个自动的唯一索引.

NOTE: You do NOT need to put an index on a PRIMARY KEY as the PRIMARY KEY CONSTRAINT creates an automatic unique INDEX.

只要在match_teams表(id)上具有PRIMARY KEY约束,则只需要在match_teams表上放置一个btree索引,如下所示:

As long as you have a PRIMARY KEY constraint on the match_teams table (id), then you should only need to put a btree index on the match_teams table like so:

CREATE INDEX match_teams_match_id_idx ON match_teams USING btree (match_id);

这篇关于具有100k条记录的SQL Server表,2个内部联接非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆