仅在Oracle中仅在一列上有所不同 [英] Distinct on one column only in Oracle

查看:80
本文介绍了仅在Oracle中仅在一列上有所不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在下表中使用distinct,但仅在"PlayerID"列上使用.这是我目前所拥有的:

I would like to use distinct on the following table, but only on the 'PlayerID' column. This is what I have at the moment:

   MATCHID   PLAYERID     TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
        20          5          2 14-JAN-12 01-JUN-11
        20          5          4 14-JAN-12 01-JUN-10
        20          7          4 14-JAN-12 01-JUN-11
        20          7          2 14-JAN-12 01-JUN-10
        20         10          4 14-JAN-12 01-JUN-11
        20         11          2 14-JAN-12 01-JUN-10
        20         13          2 14-JAN-12 01-JUN-11
        20         16          4 14-JAN-12 01-JUN-10
        20         17          4 14-JAN-12 01-JUN-10
        20         18          4 14-JAN-12 01-JUN-10
        20         19          2 14-JAN-12 01-JUN-11

这就是我想要的,因此将显示每个"PlayerID"的最高"StartDate",而忽略下一行:

And this is what I want, so that the highest 'StartDate' for each 'PlayerID' is shown and the next row ignored:

   MATCHID   PLAYERID     TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
        20          5          2 14-JAN-12 01-JUN-11
        20          7          4 14-JAN-12 01-JUN-11
        20         10          4 14-JAN-12 01-JUN-11
        20         11          2 14-JAN-12 01-JUN-10
        20         13          2 14-JAN-12 01-JUN-11
        20         16          4 14-JAN-12 01-JUN-10
        20         17          4 14-JAN-12 01-JUN-10
        20         18          4 14-JAN-12 01-JUN-10
        20         19          2 14-JAN-12 01-JUN-11

当前SQL:

SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
ORDER BY pi.MatchID ASC, pi.PlayerID ASC, pf.StartDate DESC;

这是一个Oracle数据库.

It's an Oracle database.

谢谢.

推荐答案

几点...

  • 除非使用对Made_Up_OfSeason的联接来过滤出行,否则不需要这些表.我把它们留在这里了;您可以根据需要将它们重新添加.

  • Unless you're using the joins to Made_Up_Of and Season to filter out rows, you don't need these tables. I've left them out here; you can add them back in if you need them.

Mark Tickner是正确的,应该使用ANSI JOIN语法.关于它的好处(不是标准的)是,它将连接逻辑与要连接的表放在一起.一旦习惯了,我认为您会发现它更可取.

Mark Tickner is correct that you should use the ANSI JOIN syntax. The nice thing about it (other than being standard) is that it puts the join logic right with the table being joined. Once you get used to it I think you'll find it preferable.

您真正想要的是每个PlayerID的最大值pf.StartDate,这非常适合分析性ROW_NUMBER()函数. PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC基本上将值1分配给具有每个玩家最近排序日期的行.外部过滤掉所有排名为1的行.

What you're really after is the maximum pf.StartDate for each PlayerID, which is a nice fit for the analytical ROW_NUMBER() function. The PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC will basically assign the value 1 to the row with each player's most recent sort date. The outer filters out all rows except those with the 1 ranking.

您也可以使用RANK()DENSE_RANK()分析功能分配排名,但是如果玩家最近的日期并列,那么所有并列的日期将排名第一,您将获得该玩家的多行.在这种情况下,每个玩家只希望排一行,请改用ROW_NUMBER().

You can also assign rankings with the RANK() and DENSE_RANK() analytical functions, but if a player has a tie for the most recent date then all the tied dates will be ranked #1 and you'll get multiple rows for that player. In situations like this where you only want one row per player, use ROW_NUMBER() instead.

将所有内容放在一起,您会得到:

Put it all together and you get this:

SELECT MatchID, PlayerID, TeamID, MatchDte, StartDate FROM (
  SELECT
    pi.MatchID,
    pi.PlayerID,
    t.TeamID,
    m.MatchDate,
    pf.StartDate,
    ROW_NUMBER() OVER (PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC) AS StartDateRank
  FROM Plays_In pi
  INNER JOIN Match m ON pi.MatchID = m.MatchID
  INNER JOIN Plays_A pa ON m.MatchID = pa.MatchID
  INNER JOIN Team t ON pa.TeamID = t.TeamID
  INNER JOIN Plays_For pf ON pf.PlayerID = pi.PlayerID AND pf.TeamID = t.TeamID
  WHERE pi.MatchID = '&match_id'
  AND m.MatchDate >= pf.StartDate
)
WHERE StartDateRank = 1
ORDER BY MatchID, PlayerID

最后一点:基于WHERE pi.MatchID = '&match_id',您可能正在使用PHP作为前端,并使用mysql函数进行查询.如果是这样,请改为查看mysqliPDO,因为它们可以保护您免受SQL注入的侵害. mysql函数(已正式弃用)不会.

One final point: based on the WHERE pi.MatchID = '&match_id' it looks like you may be using PHP as your front end and the mysql functions to do the query. If so, please look into mysqli or PDO instead, as they'll protect you from SQL Injection. The mysql functions (which are officially deprecated) will not.

附录:有关ROW_NUMBER的更多信息,非常感谢@AndriyM.

Addendum: More information about ROW_NUMBER, with many thanks to @AndriyM.

对于ROW_NUMBER,如果玩家具有多于一个最近日期的行,则仅将其中一行指定为ROW_NUMBER = 1,并且将或多或少地随机选择该行.这是一个示例,其中玩家的最近日期是2013年5月1日,并且该玩家具有该日期的三行:

With ROW_NUMBER, if a player has more than one row with the most recent date, only one of the rows will be assigned as ROW_NUMBER = 1, and that row will be picked more or less randomly. Here's an example, where a player's most recent date is 5/1/2013 and the player has three rows with this date:

pi.MatchID  pi.PlayerID  pf.StartDate
----------  -----------  ------------
      100         1000   05/01/2013 <-- could be ROW_NUMBER = 1
      101         1000   04/29/2013
      105         1000   05/01/2013 <-- could be ROW_NUMBER = 1
      102         1000   05/01/2013 <-- could be ROW_NUMBER = 1 
      107         1000   04/18/2013

请注意,上面的行中只有一个会被分配ROW_NUMBER = 1,并且它可以是其中的任何一个. Oracle将由您决定,而不是您.

Note that only one of the rows above will be assigned ROW_NUMBER = 1, and it can be any of them. Oracle will decide, not you.

如果不确定性是一个问题,请按其他列排序以获得明确的赢家.对于此示例,最高的pi.MatchID将用于确定"true" ROW_NUMBER = 1:

If this uncertainty is a problem, order by additional columns to get a clear winner. For this example, the highest pi.MatchID will be used to determine the "true" ROW_NUMBER = 1:

-- replace `ROW_NUMBER...` in the query above with this:
    ROW_NUMBER() OVER (
      PARTITION BY pi.PlayerID
      ORDER BY pf.StartDate DESC, pi.MatchID DESC) AS StartDateRank

现在,如果要为最高的pf.StartDate设置一个平局,Oracle会在最高的pf.StartDate 的行子集中寻找最高的pi.MatchID .事实证明,只有一行满足以下条件:

Now if there's a tie for the highest pf.StartDate, Oracle looks for the highest pi.MatchID within the subset of rows with the highest pf.StartDate. As it turns out, only one row satisfies this condition:

pi.MatchID  pi.PlayerID  pf.StartDate
----------  -----------  ------------
      100         1000   05/01/2013
      101         1000   04/29/2013
      105         1000   05/01/2013 <-- is ROW_NUMBER = 1: highest MatchID for
                                     -- most recent StartDate (5/1/2013)
      102         1000   05/01/2013
      107         1000   04/18/2013 <-- not considered: has the highest MatchID but isn't
                                     -- in the subset with the most recent StartDate

这篇关于仅在Oracle中仅在一列上有所不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆