是否应将SQL排名功能视为“谨慎使用"? [英] Should SQL ranking functionality be considered as "use with caution"

查看:64
本文介绍了是否应将SQL排名功能视为“谨慎使用"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题源自有关是否在

This question originates from a discussion on whether to use SQL ranking functionality or not in a particular case.

任何常见的RDBMS都包含一些排名功能,即其查询语言具有类似TOP n ... ORDER BY keyROW_NUMBER() OVER (ORDER BY key)ORDER BY key LIMIT n的元素(

Any common RDBMS includes some ranking functionality, i.e. its query language has elements like TOP n ... ORDER BY key, ROW_NUMBER() OVER (ORDER BY key), or ORDER BY key LIMIT n (overview).

如果您只想显示大量记录中的一小部分,它们在提高性能方面做得很好.但是,它们也带来了一个重大陷阱:如果key不是唯一的,则结果是不确定的.考虑以下示例:

They do a great job in increasing performance if you want to present only a small chunk out of a huge number of records. But they also introduce a major pitfall: If key is not unique results are non-deterministic. Consider the following example:

users

user_id name
1       John
2       Paul
3       George
4       Ringo

logins

login_id user_id login_date
1        4       2009-08-17
2        1       2009-08-18
3        2       2009-08-19
4        3       2009-08-20

查询应该返回上次登录的人:

A query is supposed to return the person who logged in last:

SELECT TOP 1 users.*
FROM
  logins JOIN
  users ON logins.user_id = users.user_id
ORDER BY logins.login_date DESC

正如预期的那样返回George,一切看起来都很好.但随后将一条新记录插入到logins表中:

Just as expected George is returned and everything looks fine. But then a new record is inserted into logins table:

1        4       2009-08-17
2        1       2009-08-18
3        2       2009-08-19
4        3       2009-08-20
5        4       2009-08-20

上面的查询现在返回什么? Ringo? George?你不知道据我记得例如MySQL 4.1返回第一个实际创建的,符合条件的记录,即结果为George.但这可能因版本而异,并且随DBMS的不同而不同.应该退还什么?可能有人说Ringo,因为他显然是最后登录了,但这纯粹是解释.我认为两者都应该返回,因为您无法根据可用数据做出明确的决定.

What does the query above return now? Ringo? George? You can't tell. As far as I remember e.g. MySQL 4.1 returns the first record physically created that matches the criteria, i.e. the result would be George. But this may vary from version to version and from DBMS to DBMS. What should have been returned? One might say Ringo since he apparently logged in last but this is pure interpretation. In my opinion both should have been returned, because you can't decide unambiguously from the data available.

因此该查询符合要求:

SELECT users.*
FROM
  logins JOIN
  users ON
    logins.user_id = users.user_id AND
    logins.login_date = (
      SELECT max(logins.login_date)
      FROM
        logins JOIN
        users ON logins.user_id = users.user_id)

作为替代方案,某些DBMS提供特殊功能(例如Microsoft SQL Server 2005引入了TOP n WITH TIES ... ORDER BY key(由 gbn 建议) ,RANKDENSE_RANK为此目的.)

As an alternative some DBMSs provide special functions (e.g. Microsoft SQL Server 2005 introduces TOP n WITH TIES ... ORDER BY key (suggested by gbn), RANK, and DENSE_RANK for this very purpose).

例如,如果您搜索SO. ROW_NUMBER您将找到许多建议使用排名功能的解决方案,而错过指出可能存在的问题.

If you search SO for e.g. ROW_NUMBER you'll find numerous solutions which suggest using ranking functionality and miss to point out the possible problems.

问题:如果提出了包含排名功能的解决方案,应该给出什么建议?

推荐答案

摘要如下:

  • 先用头.应该很明显,但这始终是一个很好的起点.您是否完全希望n行,或者期望满足约束条件的行数可能有所不同?重新考虑您的设计.如果您希望精确地找到n行,那么在无法明确识别行的情况下,模型的设计可能会很糟糕.如果预计行数可能会有所不同,则可能需要调整UI才能显示查询结果.
  • key中添加使其唯一的列(例如PK).您至少要获得对返回结果的控制权.几乎总是有一种方法可以做到
  • Use your head first. Should be obvious, but it is always a good point to start. Do you expect n rows exactly or do you expect a possibly varying number of rows that fulfill a constraint? Reconsider your design. If you're expecting n rows exactly, your model might be designed poorly if it's impossible to identify a row unambiguously. If you expect a possibly varying number of rows, you might need to adjust your UI in order to present your query results.
  • Add columns to key that make it unique (e.g. PK). You at least gain back control on the returned result. There is almost always a way to do this as Quassnoi pointed out.
  • Consider using possibly more suitable functions like RANK, DENSE_RANK and TOP n WITH TIES. They are available in Microsoft SQL Server by 2005 version and in PosgreSQL from 8.4 onwards. If these functions are not available, consider using nested queries with aggregation instead of ranking functions.

这篇关于是否应将SQL排名功能视为“谨慎使用"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆