仅在Oracle中仅在一列上有所不同 [英] Distinct on one column only in Oracle
问题描述
我想在下表中使用distinct,但仅在"PlayerID"列上使用.这是我目前所拥有的:
I would like to use distinct on the following table, but only on the 'PlayerID' column. This is what I have at the moment:
MATCHID PLAYERID TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
20 5 2 14-JAN-12 01-JUN-11
20 5 4 14-JAN-12 01-JUN-10
20 7 4 14-JAN-12 01-JUN-11
20 7 2 14-JAN-12 01-JUN-10
20 10 4 14-JAN-12 01-JUN-11
20 11 2 14-JAN-12 01-JUN-10
20 13 2 14-JAN-12 01-JUN-11
20 16 4 14-JAN-12 01-JUN-10
20 17 4 14-JAN-12 01-JUN-10
20 18 4 14-JAN-12 01-JUN-10
20 19 2 14-JAN-12 01-JUN-11
这就是我想要的,因此将显示每个"PlayerID"的最高"StartDate",而忽略下一行:
And this is what I want, so that the highest 'StartDate' for each 'PlayerID' is shown and the next row ignored:
MATCHID PLAYERID TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
20 5 2 14-JAN-12 01-JUN-11
20 7 4 14-JAN-12 01-JUN-11
20 10 4 14-JAN-12 01-JUN-11
20 11 2 14-JAN-12 01-JUN-10
20 13 2 14-JAN-12 01-JUN-11
20 16 4 14-JAN-12 01-JUN-10
20 17 4 14-JAN-12 01-JUN-10
20 18 4 14-JAN-12 01-JUN-10
20 19 2 14-JAN-12 01-JUN-11
当前SQL:
SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
ORDER BY pi.MatchID ASC, pi.PlayerID ASC, pf.StartDate DESC;
这是一个Oracle数据库.
It's an Oracle database.
谢谢.
推荐答案
几点...
-
除非使用对
Made_Up_Of
和Season
的联接来过滤出行,否则不需要这些表.我把它们留在这里了;您可以根据需要将它们重新添加.
Unless you're using the joins to
Made_Up_Of
andSeason
to filter out rows, you don't need these tables. I've left them out here; you can add them back in if you need them.
Mark Tickner是正确的,应该使用ANSI JOIN语法.关于它的好处(不是标准的)是,它将连接逻辑与要连接的表放在一起.一旦习惯了,我认为您会发现它更可取.
Mark Tickner is correct that you should use the ANSI JOIN syntax. The nice thing about it (other than being standard) is that it puts the join logic right with the table being joined. Once you get used to it I think you'll find it preferable.
您真正想要的是每个PlayerID
的最大值pf.StartDate
,这非常适合分析性ROW_NUMBER()
函数. PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC
基本上将值1
分配给具有每个玩家最近排序日期的行.外部过滤掉所有排名为1
的行.
What you're really after is the maximum pf.StartDate
for each PlayerID
, which is a nice fit for the analytical ROW_NUMBER()
function. The PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC
will basically assign the value 1
to the row with each player's most recent sort date. The outer filters out all rows except those with the 1
ranking.
您也可以使用RANK()
和DENSE_RANK()
分析功能分配排名,但是如果玩家最近的日期并列,那么所有并列的日期将排名第一,您将获得该玩家的多行.在这种情况下,每个玩家只希望排一行,请改用ROW_NUMBER()
.
You can also assign rankings with the RANK()
and DENSE_RANK()
analytical functions, but if a player has a tie for the most recent date then all the tied dates will be ranked #1 and you'll get multiple rows for that player. In situations like this where you only want one row per player, use ROW_NUMBER()
instead.
将所有内容放在一起,您会得到:
Put it all together and you get this:
SELECT MatchID, PlayerID, TeamID, MatchDte, StartDate FROM (
SELECT
pi.MatchID,
pi.PlayerID,
t.TeamID,
m.MatchDate,
pf.StartDate,
ROW_NUMBER() OVER (PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC) AS StartDateRank
FROM Plays_In pi
INNER JOIN Match m ON pi.MatchID = m.MatchID
INNER JOIN Plays_A pa ON m.MatchID = pa.MatchID
INNER JOIN Team t ON pa.TeamID = t.TeamID
INNER JOIN Plays_For pf ON pf.PlayerID = pi.PlayerID AND pf.TeamID = t.TeamID
WHERE pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
)
WHERE StartDateRank = 1
ORDER BY MatchID, PlayerID
最后一点:基于WHERE pi.MatchID = '&match_id'
,您可能正在使用PHP作为前端,并使用mysql
函数进行查询.如果是这样,请改为查看mysqli
或PDO
,因为它们可以保护您免受SQL注入的侵害. mysql
函数(已正式弃用)不会.
One final point: based on the WHERE pi.MatchID = '&match_id'
it looks like you may be using PHP as your front end and the mysql
functions to do the query. If so, please look into mysqli
or PDO
instead, as they'll protect you from SQL Injection. The mysql
functions (which are officially deprecated) will not.
附录:有关ROW_NUMBER
的更多信息,非常感谢@AndriyM.
Addendum: More information about ROW_NUMBER
, with many thanks to @AndriyM.
对于ROW_NUMBER
,如果玩家具有多于一个最近日期的行,则仅将其中一行指定为ROW_NUMBER = 1
,并且将或多或少地随机选择该行.这是一个示例,其中玩家的最近日期是2013年5月1日,并且该玩家具有该日期的三行:
With ROW_NUMBER
, if a player has more than one row with the most recent date, only one of the rows will be assigned as ROW_NUMBER = 1
, and that row will be picked more or less randomly. Here's an example, where a player's most recent date is 5/1/2013 and the player has three rows with this date:
pi.MatchID pi.PlayerID pf.StartDate
---------- ----------- ------------
100 1000 05/01/2013 <-- could be ROW_NUMBER = 1
101 1000 04/29/2013
105 1000 05/01/2013 <-- could be ROW_NUMBER = 1
102 1000 05/01/2013 <-- could be ROW_NUMBER = 1
107 1000 04/18/2013
请注意,上面的行中只有一个会被分配ROW_NUMBER = 1
,并且它可以是其中的任何一个. Oracle将由您决定,而不是您.
Note that only one of the rows above will be assigned ROW_NUMBER = 1
, and it can be any of them. Oracle will decide, not you.
如果不确定性是一个问题,请按其他列排序以获得明确的赢家.对于此示例,最高的pi.MatchID
将用于确定"true" ROW_NUMBER = 1
:
If this uncertainty is a problem, order by additional columns to get a clear winner. For this example, the highest pi.MatchID
will be used to determine the "true" ROW_NUMBER = 1
:
-- replace `ROW_NUMBER...` in the query above with this:
ROW_NUMBER() OVER (
PARTITION BY pi.PlayerID
ORDER BY pf.StartDate DESC, pi.MatchID DESC) AS StartDateRank
现在,如果要为最高的pf.StartDate
设置一个平局,Oracle会在最高的pf.StartDate
的行子集中寻找最高的pi.MatchID
.事实证明,只有一行满足以下条件:
Now if there's a tie for the highest pf.StartDate
, Oracle looks for the highest pi.MatchID
within the subset of rows with the highest pf.StartDate
. As it turns out, only one row satisfies this condition:
pi.MatchID pi.PlayerID pf.StartDate
---------- ----------- ------------
100 1000 05/01/2013
101 1000 04/29/2013
105 1000 05/01/2013 <-- is ROW_NUMBER = 1: highest MatchID for
-- most recent StartDate (5/1/2013)
102 1000 05/01/2013
107 1000 04/18/2013 <-- not considered: has the highest MatchID but isn't
-- in the subset with the most recent StartDate
这篇关于仅在Oracle中仅在一列上有所不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!