将同一表的多个列上的条件计数相加 [英] Add up conditional counts on multiple columns of the same table

查看:175
本文介绍了将同一表的多个列上的条件计数相加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种更好"的方式来执行查询,在该查询中,我想向单个玩家显示他之前玩过的游戏以及与每个此类对手相关的获胜记录.

以下是涉及的表,精简如下:

create table player (player_id int, username text);
create table match (winner_id int, loser_id int);

insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
                       , (2, 1), (4, 1), (4, 1);

因此,约翰对玛丽的战绩为2胜1负; vs鲍伯1胜0负;和爱丽丝的3胜2负.

create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);

我正在使用Postgres 9.4.我脑海中有些东西告诉我以某种方式考虑LATERAL,但是我很难理解这样的形状".

以下是我当前正在使用的查询,但是有些感觉不到".请帮助我学习和改进.

select p.username as opponent, 
       coalesce(r.won, 0) as won, 
       coalesce(r.lost, 0) as lost
from (
    select m.winner_id, m.loser_id, count(m.*) as won, (
        select t.lost
        from (
            select winner_id, loser_id, count(*) as lost
            from match
            where loser_id = m.winner_id
            and winner_id = m.loser_id
            group by winner_id, loser_id
        ) t 
    )   
    from match m
    where m.winner_id = 1   -- this would be a parameter
    group by m.winner_id, m.loser_id
) r 
join player p on p.player_id = r.loser_id;

这按预期工作.只是想学习一些技巧或更好但更合适的技术来做到这一点.

opponent  won  lost
--------  ---  ----
alice     3    2
bob       1    0
mary      2    1

解决方案

查询

查询并不像初看起来那样简单.最短的查询字符串不一定会产生最佳性能.这应该尽快完成,为此应尽可能短:

SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  winner_id = 1  -- your player_id here
   GROUP  BY 1           -- positional reference (not your player_id)
   ) w
FULL JOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  loser_id = 1   -- your player_id here
   GROUP  BY 1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDER  BY 1;

结果完全符合要求:

username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1

SQL提琴 -具有更多可揭示的测试数据!

关键功能是 FULL [OUTER] JOIN 两个子查询之间的"strong> "(输赢).这将产生一个表格,其中列出了我们的候选人所针对的所有玩家.连接条件中的USING子句可以方便地将两个player_id列合并为一个.

此后,用一个JOINplayer来获取名称,然后仅索引扫描)出于这一点.这样,Postgres甚至根本不会访问match ,您就会获得超快的结果.

在两列integer列中,您碰巧遇到了局部最优值:这些索引的大小与简单索引的大小相同.详细信息:

更恐怖,但是很慢

您可以运行相关的子查询,例如 @Giorgi建议,只需正确地运行 :

SELECT *
FROM  (
   SELECT username
       , (SELECT count(*) FROM match
          WHERE  loser_id  = p.player_id
          AND    winner_id = 1) AS won
       , (SELECT count(*) FROM match
          WHERE  winner_id = p.player_id
          AND    loser_id  = 1) AS lost
   FROM   player p
   WHERE  player_id <> 1
   ) sub
WHERE (won > 0 OR lost > 0)
ORDER  BY username;

适用于小型表,但不能扩展.这需要每个现有播放器在player上进行顺序扫描,并在match上进行两次索引扫描.用EXPLAIN ANALYZE比较性能.

I am looking for a "better" way to perform a query in which I want to show a single player who he has played previously and the associated win-loss record for each such opponent.

Here are the tables involved stripped down to essentials:

create table player (player_id int, username text);
create table match (winner_id int, loser_id int);

insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
                       , (2, 1), (4, 1), (4, 1);

Thus, john has a record of 2 wins and 1 loss vs mary; 1 win and 0 losses vs bob; and 3 wins and 2 losses vs alice.

create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);

I am using Postgres 9.4. Something in the back of my head tells me to consider LATERAL somehow but I'm having a hard time understanding the "shape" of such.

The following is the query I am using currently but something "feels off". Please help me learn and improve this.

select p.username as opponent, 
       coalesce(r.won, 0) as won, 
       coalesce(r.lost, 0) as lost
from (
    select m.winner_id, m.loser_id, count(m.*) as won, (
        select t.lost
        from (
            select winner_id, loser_id, count(*) as lost
            from match
            where loser_id = m.winner_id
            and winner_id = m.loser_id
            group by winner_id, loser_id
        ) t 
    )   
    from match m
    where m.winner_id = 1   -- this would be a parameter
    group by m.winner_id, m.loser_id
) r 
join player p on p.player_id = r.loser_id;

This works as expected. Just looking to learn some tricks or better yet proper techniques to do the same.

opponent  won  lost
--------  ---  ----
alice     3    2
bob       1    0
mary      2    1

解决方案

Query

The query is not as simple as it looks at first. The shortest query string does not necessarily yield best performance. This should be as fast as it gets, being as short as possible for that:

SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  winner_id = 1  -- your player_id here
   GROUP  BY 1           -- positional reference (not your player_id)
   ) w
FULL JOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  loser_id = 1   -- your player_id here
   GROUP  BY 1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDER  BY 1;

Result exactly as requested:

username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1

SQL Fiddle - with more revealing test data!

The key feature is the FULL [OUTER] JOIN between the two subqueries for losses and wins. This produces a table of all players our candidate has played against. The USING clause in the join condition conveniently merges the two player_id columns into one.

After that, a single JOIN to player to get the name, and COALESCE to replace NULL with 0. Voilá.

Index

Would be even faster with two multicolumn indexes:

CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser  on match (loser_id, winner_id);

Only if you get index-only scans out of this. Then Postgres does not even visit the match table at all and you get super-fast results.

With two integer columns you happen to hit a local optimum: theses indexes have just the same size as the simple ones you had. Details:

Shorter, but slow

You could run correlated subqueries like @Giorgi suggested, just working correctly:

SELECT *
FROM  (
   SELECT username
       , (SELECT count(*) FROM match
          WHERE  loser_id  = p.player_id
          AND    winner_id = 1) AS won
       , (SELECT count(*) FROM match
          WHERE  winner_id = p.player_id
          AND    loser_id  = 1) AS lost
   FROM   player p
   WHERE  player_id <> 1
   ) sub
WHERE (won > 0 OR lost > 0)
ORDER  BY username;

Works fine for small tables, but doesn't scale. This needs a sequential scan on player and two index scans on match per existing player. Compare performance with EXPLAIN ANALYZE.

这篇关于将同一表的多个列上的条件计数相加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆