根据两列选择每个组中的最佳行 [英] Selecting best row in each group based on two columns
问题描述
假设我们有下表,其中每一行代表用户在编程竞赛中提交的内容,id
是自动递增的主键,probid
标识提交内容的问题,score
是提交问题所获得的分数,而date
是提交问题时的时间戳.每个用户可以针对自己的问题提交任意多次:
Suppose we have the following table, where each row represents a submission a user made during a programming contest, id
is an auto-increment primary key, probid
identifies the problem the submission was made to, score
is the number of points the submission earned for the problem, and date
is the timestamp when the submission was made. Each user can submit as many times as they want to the same problem:
+----+----------+--------+-------+------------+
| id | username | probid | score | date |
+----+----------+--------+-------+------------+
| 1 | brian | 1 | 5 | 1542766686 |
| 2 | alex | 1 | 10 | 1542766686 |
| 3 | alex | 2 | 5 | 1542766901 |
| 4 | brian | 1 | 10 | 1542766944 |
| 5 | jacob | 2 | 10 | 1542766983 |
| 6 | jacob | 1 | 10 | 1542767053 |
| 7 | brian | 2 | 8 | 1542767271 |
| 8 | jacob | 2 | 10 | 1542767456 |
| 9 | brian | 2 | 7 | 1542767522 |
+----+----------+--------+-------+------------+
为了对参赛者进行排名,我们需要确定每个用户对每个问题的最佳提交方式. 最好"的提交是得分最高的,并且由提交ID打破联系(即,如果用户两次在同一问题上获得相同的得分,则我们只关心这两个提交中的较早者).这将产生一个如下表:
In order to rank the contestants, we need to determine the best submission each user made to each problem. The "best" submission is the one with the highest score, with ties broken by submission ID (i.e., if the user got the same score on the same problem twice, we only care about the earlier of the two submissions). This would yield a table like the following:
+----------+--------+----+-------+------------+
| username | probid | id | score | date |
+----------+--------+----+-------+------------+
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
+----------+--------+----+-------+------------+
如何编写查询来完成此操作?
How can I write a query to accomplish this?
推荐答案
在MySQL 8.0.2之前的版本中,我们可以模拟用户定义的变量.在此技术中,我们首先按特定顺序获取数据(取决于手头的问题陈述).
In pre-MySQL 8.0.2, we can emulate Row_Number()
functionality using User-defined Variables. In this technique, we firstly get the data in a particular order (depends on the problem statement at hand).
在您的情况下,在probid
和username
的分区中,我们需要按降序对分数进行排名,其中具有较低时间戳值的行被赋予较高的优先级(以打破平局).因此,我们将ORDER BY probid, username, score DESC, date ASC
.
In your case, within a partition of probid
and username
, we need to rank scores in descending order, with the row having lower timestamp value given higher priority (to break the ties). So, we will ORDER BY probid, username, score DESC, date ASC
.
现在,我们可以将此结果集用作派生表,并确定行号.就像循环技术(我们在应用程序代码中使用的,例如:PHP)一样.我们将前一行的值存储在用户定义的变量中,并使用条件 CASE .. WHEN
表达式可根据前一行检查当前行的值.然后,相应地分配行号.
Now, we can use this result-set as a Derived Table, and determine the row number. It will be like a Looping technique (which we use in application code, eg: PHP). We would store the previous row values in the User-defined variables, and use conditional CASE .. WHEN
expressions to check the current row's value(s) against the previous row. And, then assign row number accordingly.
最终,我们将仅考虑行号为1 的行,并(如果需要)按username
和probid
对其进行排序.
Eventually, we will consider only those rows where row number is 1, and (if required), sort it by username
and probid
.
查询
SELECT dt2.username,
dt2.probid,
dt2.id,
dt2.score,
dt2.date
FROM (SELECT @rn := CASE
WHEN @un = dt1.username
AND @pid = dt1.probid THEN @rn + 1
ELSE 1
end AS row_no,
@un := dt1.username AS username,
@pid := dt1.probid AS probid,
dt1.id,
dt1.score,
dt1.date
FROM (SELECT id,
username,
probid,
score,
date
FROM your_table
ORDER BY username,
probid,
score DESC,
date ASC) AS dt1
CROSS JOIN (SELECT @un := '',
@pid := 0,
@rn := 0) AS user_init_vars) AS dt2
WHERE dt2.row_no = 1
ORDER BY dt2.username, dt2.probid;
结果
| username | probid | id | score | date |
| -------- | ------ | --- | ----- | ---------- |
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
这篇关于根据两列选择每个组中的最佳行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!