根据两列选择每个组中的最佳行 [英] Selecting best row in each group based on two columns

查看:65
本文介绍了根据两列选择每个组中的最佳行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有下表,其中每一行代表用户在编程竞赛中提交的内容,id是自动递增的主键,probid标识提交内容的问题,score是提交问题所获得的分数,而date是提交问题时的时间戳.每个用户可以针对自己的问题提交任意多次:

Suppose we have the following table, where each row represents a submission a user made during a programming contest, id is an auto-increment primary key, probid identifies the problem the submission was made to, score is the number of points the submission earned for the problem, and date is the timestamp when the submission was made. Each user can submit as many times as they want to the same problem:

+----+----------+--------+-------+------------+
| id | username | probid | score |    date    |
+----+----------+--------+-------+------------+
|  1 | brian    |      1 |     5 | 1542766686 |
|  2 | alex     |      1 |    10 | 1542766686 |
|  3 | alex     |      2 |     5 | 1542766901 |
|  4 | brian    |      1 |    10 | 1542766944 |
|  5 | jacob    |      2 |    10 | 1542766983 |
|  6 | jacob    |      1 |    10 | 1542767053 |
|  7 | brian    |      2 |     8 | 1542767271 |
|  8 | jacob    |      2 |    10 | 1542767456 |
|  9 | brian    |      2 |     7 | 1542767522 |
+----+----------+--------+-------+------------+

为了对参赛者进行排名,我们需要确定每个用户对每个问题的最佳提交方式. 最好"的提交是得分最高的,并且由提交ID打破联系(即,如果用户两次在同一问题上获得相同的得分,则我们只关心这两个提交中的较早者).这将产生一个如下表:

In order to rank the contestants, we need to determine the best submission each user made to each problem. The "best" submission is the one with the highest score, with ties broken by submission ID (i.e., if the user got the same score on the same problem twice, we only care about the earlier of the two submissions). This would yield a table like the following:

+----------+--------+----+-------+------------+
| username | probid | id | score |    date    |
+----------+--------+----+-------+------------+
| alex     |      1 |  2 |    10 | 1542766686 |
| alex     |      2 |  3 |     5 | 1542766901 |
| brian    |      1 |  4 |    10 | 1542766944 |
| brian    |      2 |  7 |     8 | 1542767271 |
| jacob    |      1 |  6 |    10 | 1542767053 |
| jacob    |      2 |  5 |    10 | 1542766983 |
+----------+--------+----+-------+------------+

如何编写查询来完成此操作?

How can I write a query to accomplish this?

推荐答案

在MySQL 8.0.2之前的版本中,我们可以模拟技术中,我们首先按特定顺序获取数据(取决于手头的问题陈述).

In pre-MySQL 8.0.2, we can emulate Row_Number() functionality using User-defined Variables. In this technique, we firstly get the data in a particular order (depends on the problem statement at hand).

在您的情况下,在probidusername的分区中,我们需要按降序对分数进行排名,其中具有较低时间戳值的行被赋予较高的优先级(以打破平局).因此,我们将ORDER BY probid, username, score DESC, date ASC.

In your case, within a partition of probid and username, we need to rank scores in descending order, with the row having lower timestamp value given higher priority (to break the ties). So, we will ORDER BY probid, username, score DESC, date ASC.

现在,我们可以将此结果集用作派生表,并确定行号.就像循环技术(我们在应用程序代码中使用的,例如:PHP)一样.我们将前一行的值存储在用户定义的变量中,并使用条件 CASE .. WHEN 表达式可根据前一行检查当前行的值.然后,相应地分配行号.

Now, we can use this result-set as a Derived Table, and determine the row number. It will be like a Looping technique (which we use in application code, eg: PHP). We would store the previous row values in the User-defined variables, and use conditional CASE .. WHEN expressions to check the current row's value(s) against the previous row. And, then assign row number accordingly.

最终,我们将仅考虑行号为1 的行,并(如果需要)按usernameprobid对其进行排序.

Eventually, we will consider only those rows where row number is 1, and (if required), sort it by username and probid.

查询

SELECT dt2.username,
       dt2.probid,
       dt2.id,
       dt2.score,
       dt2.date
FROM   (SELECT @rn := CASE
                        WHEN @un = dt1.username
                             AND @pid = dt1.probid THEN @rn + 1
                        ELSE 1
                      end          AS row_no,
               @un := dt1.username AS username,
               @pid := dt1.probid  AS probid,
               dt1.id,
               dt1.score,
               dt1.date
        FROM   (SELECT id,
                       username,
                       probid,
                       score,
                       date
                FROM   your_table
                ORDER  BY username,
                          probid,
                          score DESC,
                          date ASC) AS dt1
               CROSS JOIN (SELECT @un := '',
                                  @pid := 0,
                                  @rn := 0) AS user_init_vars) AS dt2
WHERE  dt2.row_no = 1  
ORDER BY dt2.username, dt2.probid;

结果

| username | probid | id  | score | date       |
| -------- | ------ | --- | ----- | ---------- |
| alex     | 1      | 2   | 10    | 1542766686 |
| alex     | 2      | 3   | 5     | 1542766901 |
| brian    | 1      | 4   | 10    | 1542766944 |
| brian    | 2      | 7   | 8     | 1542767271 |
| jacob    | 1      | 6   | 10    | 1542767053 |
| jacob    | 2      | 5   | 10    | 1542766983 |


在DB Fiddle上查看

这篇关于根据两列选择每个组中的最佳行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆