将同一表的多个列上的条件计数相加 [英] Add up conditional counts on multiple columns of the same table

查看：175 发布时间：2019/9/19 16:07:49 sql postgresql join aggregate-functions

本文介绍了将同一表的多个列上的条件计数相加的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一种更好"的方式来执行查询，在该查询中，我想向单个玩家显示他之前玩过的游戏以及与每个此类对手相关的获胜记录.

以下是涉及的表，精简如下:

create table player (player_id int, username text);
create table match (winner_id int, loser_id int);

insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
                       , (2, 1), (4, 1), (4, 1);

因此，约翰对玛丽的战绩为2胜1负； vs鲍伯1胜0负;和爱丽丝的3胜2负.

create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);

我正在使用Postgres 9.4.我脑海中有些东西告诉我以某种方式考虑LATERAL，但是我很难理解这样的形状".

以下是我当前正在使用的查询，但是有些感觉不到".请帮助我学习和改进.

select p.username as opponent, 
       coalesce(r.won, 0) as won, 
       coalesce(r.lost, 0) as lost
from (
    select m.winner_id, m.loser_id, count(m.*) as won, (
        select t.lost
        from (
            select winner_id, loser_id, count(*) as lost
            from match
            where loser_id = m.winner_id
            and winner_id = m.loser_id
            group by winner_id, loser_id
        ) t 
    )   
    from match m
    where m.winner_id = 1   -- this would be a parameter
    group by m.winner_id, m.loser_id
) r 
join player p on p.player_id = r.loser_id;

这按预期工作.只是想学习一些技巧或更好但更合适的技术来做到这一点.

opponent  won  lost
--------  ---  ----
alice     3    2
bob       1    0
mary      2    1

解决方案

查询

查询并不像初看起来那样简单.最短的查询字符串不一定会产生最佳性能.这应该尽快完成，为此应尽可能短:

SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  winner_id = 1  -- your player_id here
   GROUP  BY 1           -- positional reference (not your player_id)
   ) w
FULL JOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  loser_id = 1   -- your player_id here
   GROUP  BY 1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDER  BY 1;

结果完全符合要求:

username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1

SQL提琴 -具有更多可揭示的测试数据！

关键功能是 FULL [OUTER] JOIN 两个子查询之间的"strong> "(输赢).这将产生一个表格，其中列出了我们的候选人所针对的所有玩家.连接条件中的USING子句可以方便地将两个player_id列合并为一个.

此后，用一个JOIN到player来获取名称，然后仅索引扫描)出于这一点.这样，Postgres甚至根本不会访问match表，您就会获得超快的结果.

在两列integer列中，您碰巧遇到了局部最优值:这些索引的大小与简单索引的大小相同.详细信息:

复合索引对第一字段的查询也有用吗?

更恐怖，但是很慢

您可以运行相关的子查询，例如 @Giorgi建议，只需正确地运行 :
SELECT * FROM ( SELECT username , (SELECT count(*) FROM match WHERE loser_id = p.player_id AND winner_id = 1) AS won , (SELECT count(*) FROM match WHERE winner_id = p.player_id AND loser_id = 1) AS lost FROM player p WHERE player_id <> 1 ) sub WHERE (won > 0 OR lost > 0) ORDER BY username;
适用于小型表，但不能扩展.这需要每个现有播放器在player上进行顺序扫描，并在match上进行两次索引扫描.用EXPLAIN ANALYZE比较性能.
I am looking for a "better" way to perform a query in which I want to show a single player who he has played previously and the associated win-loss record for each such opponent.

Here are the tables involved stripped down to essentials:
create table player (player_id int, username text); create table match (winner_id int, loser_id int); insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice'); insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4) , (2, 1), (4, 1), (4, 1);
Thus, john has a record of 2 wins and 1 loss vs mary; 1 win and 0 losses vs bob; and 3 wins and 2 losses vs alice.
create index idx_winners on match(winner_id); create index idx_winners on match(loser_id);
I am using Postgres 9.4. Something in the back of my head tells me to consider LATERAL somehow but I'm having a hard time understanding the "shape" of such.

The following is the query I am using currently but something "feels off". Please help me learn and improve this.
select p.username as opponent, coalesce(r.won, 0) as won, coalesce(r.lost, 0) as lost from ( select m.winner_id, m.loser_id, count(m.*) as won, ( select t.lost from ( select winner_id, loser_id, count(*) as lost from match where loser_id = m.winner_id and winner_id = m.loser_id group by winner_id, loser_id ) t ) from match m where m.winner_id = 1 -- this would be a parameter group by m.winner_id, m.loser_id ) r join player p on p.player_id = r.loser_id;
This works as expected. Just looking to learn some tricks or better yet proper techniques to do the same.
opponent won lost -------- --- ---- alice 3 2 bob 1 0 mary 2 1

解决方案
Query

The query is not as simple as it looks at first. The shortest query string does not necessarily yield best performance. This should be as fast as it gets, being as short as possible for that:
SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost FROM ( SELECT loser_id AS player_id, count(*) AS ct FROM match WHERE winner_id = 1 -- your player_id here GROUP BY 1 -- positional reference (not your player_id) ) w FULL JOIN ( SELECT winner_id AS player_id, count(*) AS ct FROM match WHERE loser_id = 1 -- your player_id here GROUP BY 1 ) l USING (player_id) JOIN player p USING (player_id) ORDER BY 1;
Result exactly as requested:
username | won | lost ---------+-----+----- alice | 3 | 2 bob | 1 | 0 mary | 2 | 1
SQL Fiddle - with more revealing test data!

The key feature is the FULL [OUTER] JOIN between the two subqueries for losses and wins. This produces a table of all players our candidate has played against. The USING clause in the join condition conveniently merges the two player_id columns into one.

After that, a single JOIN to player to get the name, and COALESCE to replace NULL with 0. Voilá.

Index

Would be even faster with two multicolumn indexes:
CREATE INDEX idx_winner on match (winner_id, loser_id); CREATE INDEX idx_loser on match (loser_id, winner_id);
Only if you get index-only scans out of this. Then Postgres does not even visit the match table at all and you get super-fast results.

With two integer columns you happen to hit a local optimum: theses indexes have just the same size as the simple ones you had. Details:

Is a composite index also good for queries on the first field?

Shorter, but slow

You could run correlated subqueries like @Giorgi suggested, just working correctly:
SELECT * FROM ( SELECT username , (SELECT count(*) FROM match WHERE loser_id = p.player_id AND winner_id = 1) AS won , (SELECT count(*) FROM match WHERE winner_id = p.player_id AND loser_id = 1) AS lost FROM player p WHERE player_id <> 1 ) sub WHERE (won > 0 OR lost > 0) ORDER BY username;
Works fine for small tables, but doesn't scale. This needs a sequential scan on player and two index scans on match per existing player. Compare performance with EXPLAIN ANALYZE.

这篇关于将同一表的多个列上的条件计数相加的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将同一表的多个列上的条件计数相加 [英] Add up conditional counts on multiple columns of the same table

问题描述

查询

更恐怖，但是很慢

Query

Index

Shorter, but slow

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将同一表的多个列上的条件计数相加 [英] Add up conditional counts on multiple columns of the same table

问题描述

查询

更恐怖，但是很慢

Query

Index

Shorter, but slow

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭