将同一表的多个列上的条件计数相加 [英] Add up conditional counts on multiple columns of the same table
问题描述
我正在寻找一种更好"的方式来执行查询,在该查询中,我想向单个玩家显示他之前玩过的游戏以及与每个此类对手相关的获胜记录.
以下是涉及的表,精简如下:
create table player (player_id int, username text);
create table match (winner_id int, loser_id int);
insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
, (2, 1), (4, 1), (4, 1);
因此,约翰对玛丽的战绩为2胜1负; vs鲍伯1胜0负;和爱丽丝的3胜2负.
create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);
我正在使用Postgres 9.4.我脑海中有些东西告诉我以某种方式考虑LATERAL
,但是我很难理解这样的形状".
以下是我当前正在使用的查询,但是有些感觉不到".请帮助我学习和改进.
select p.username as opponent,
coalesce(r.won, 0) as won,
coalesce(r.lost, 0) as lost
from (
select m.winner_id, m.loser_id, count(m.*) as won, (
select t.lost
from (
select winner_id, loser_id, count(*) as lost
from match
where loser_id = m.winner_id
and winner_id = m.loser_id
group by winner_id, loser_id
) t
)
from match m
where m.winner_id = 1 -- this would be a parameter
group by m.winner_id, m.loser_id
) r
join player p on p.player_id = r.loser_id;
这按预期工作.只是想学习一些技巧或更好但更合适的技术来做到这一点.
opponent won lost
-------- --- ----
alice 3 2
bob 1 0
mary 2 1
查询
查询并不像初看起来那样简单.最短的查询字符串不一定会产生最佳性能.这应该尽快完成,为此应尽可能短:
SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM (
SELECT loser_id AS player_id, count(*) AS ct
FROM match
WHERE winner_id = 1 -- your player_id here
GROUP BY 1 -- positional reference (not your player_id)
) w
FULL JOIN (
SELECT winner_id AS player_id, count(*) AS ct
FROM match
WHERE loser_id = 1 -- your player_id here
GROUP BY 1
) l USING (player_id)
JOIN player p USING (player_id)
ORDER BY 1;
结果完全符合要求:
username | won | lost
---------+-----+-----
alice | 3 | 2
bob | 1 | 0
mary | 2 | 1
SQL提琴 -具有更多可揭示的测试数据!
关键功能是 FULL [OUTER] JOIN
两个子查询之间的"strong> "(输赢).这将产生一个表格,其中列出了我们的候选人所针对的所有玩家.连接条件中的USING
子句可以方便地将两个player_id
列合并为一个.
此后,用一个JOIN
到player
来获取名称,然后仅索引扫描)出于这一点.这样,Postgres甚至根本不会访问match
表 ,您就会获得超快的结果.
在两列integer
列中,您碰巧遇到了局部最优值:这些索引的大小与简单索引的大小相同.详细信息:
更恐怖,但是很慢
您可以运行相关的子查询,例如 @Giorgi建议,只需正确地运行 :>
SELECT *
FROM (
SELECT username
, (SELECT count(*) FROM match
WHERE loser_id = p.player_id
AND winner_id = 1) AS won
, (SELECT count(*) FROM match
WHERE winner_id = p.player_id
AND loser_id = 1) AS lost
FROM player p
WHERE player_id <> 1
) sub
WHERE (won > 0 OR lost > 0)
ORDER BY username;
适用于小型表,但不能扩展.这需要每个现有播放器在player
上进行顺序扫描,并在match
上进行两次索引扫描.用EXPLAIN ANALYZE
比较性能.
I am looking for a "better" way to perform a query in which I want to show a single player who he has played previously and the associated win-loss record for each such opponent.
Here are the tables involved stripped down to essentials:
create table player (player_id int, username text);
create table match (winner_id int, loser_id int);
insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
, (2, 1), (4, 1), (4, 1);
Thus, john has a record of 2 wins and 1 loss vs mary; 1 win and 0 losses vs bob; and 3 wins and 2 losses vs alice.
create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);
I am using Postgres 9.4. Something in the back of my head tells me to consider LATERAL
somehow but I'm having a hard time understanding the "shape" of such.
The following is the query I am using currently but something "feels off". Please help me learn and improve this.
select p.username as opponent,
coalesce(r.won, 0) as won,
coalesce(r.lost, 0) as lost
from (
select m.winner_id, m.loser_id, count(m.*) as won, (
select t.lost
from (
select winner_id, loser_id, count(*) as lost
from match
where loser_id = m.winner_id
and winner_id = m.loser_id
group by winner_id, loser_id
) t
)
from match m
where m.winner_id = 1 -- this would be a parameter
group by m.winner_id, m.loser_id
) r
join player p on p.player_id = r.loser_id;
This works as expected. Just looking to learn some tricks or better yet proper techniques to do the same.
opponent won lost
-------- --- ----
alice 3 2
bob 1 0
mary 2 1
Query
The query is not as simple as it looks at first. The shortest query string does not necessarily yield best performance. This should be as fast as it gets, being as short as possible for that:
SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM (
SELECT loser_id AS player_id, count(*) AS ct
FROM match
WHERE winner_id = 1 -- your player_id here
GROUP BY 1 -- positional reference (not your player_id)
) w
FULL JOIN (
SELECT winner_id AS player_id, count(*) AS ct
FROM match
WHERE loser_id = 1 -- your player_id here
GROUP BY 1
) l USING (player_id)
JOIN player p USING (player_id)
ORDER BY 1;
Result exactly as requested:
username | won | lost
---------+-----+-----
alice | 3 | 2
bob | 1 | 0
mary | 2 | 1
SQL Fiddle - with more revealing test data!
The key feature is the FULL [OUTER] JOIN
between the two subqueries for losses and wins. This produces a table of all players our candidate has played against. The USING
clause in the join condition conveniently merges the two player_id
columns into one.
After that, a single JOIN
to player
to get the name, and COALESCE
to replace NULL with 0. Voilá.
Index
Would be even faster with two multicolumn indexes:
CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser on match (loser_id, winner_id);
Only if you get index-only scans out of this. Then Postgres does not even visit the match
table at all and you get super-fast results.
With two integer
columns you happen to hit a local optimum: theses indexes have just the same size as the simple ones you had. Details:
Shorter, but slow
You could run correlated subqueries like @Giorgi suggested, just working correctly:
SELECT *
FROM (
SELECT username
, (SELECT count(*) FROM match
WHERE loser_id = p.player_id
AND winner_id = 1) AS won
, (SELECT count(*) FROM match
WHERE winner_id = p.player_id
AND loser_id = 1) AS lost
FROM player p
WHERE player_id <> 1
) sub
WHERE (won > 0 OR lost > 0)
ORDER BY username;
Works fine for small tables, but doesn't scale. This needs a sequential scan on player
and two index scans on match
per existing player. Compare performance with EXPLAIN ANALYZE
.
这篇关于将同一表的多个列上的条件计数相加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!