ORDER BY适用于DISTINCT之前还是之后? [英] Does ORDER BY apply before or after DISTINCT?

查看:229
本文介绍了ORDER BY适用于DISTINCT之前还是之后?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MySQL查询中,当使用DISTINCT选项时,ORDER BY是否在删除重复项后适用?如果没有,有没有办法做到这一点?我认为这会导致我的代码出现一些问题.

In a MySQL query, when using the DISTINCT option, does ORDER BY apply after the duplicates are removed? If not, is there any way to make it do so? I think it's causing some issues with my code.

编辑:
这是有关导致我的问题的原因的更多信息.我了解,乍一看,此顺序并不重要,因为我正在处理重复的行.但是,由于我使用的是INNER JOIN对行进行排序,因此情况并非完全如此.

EDIT:
Here's some more information about what's causing my problem. I understand that, at first glance, this order would not be important, since I am dealing with duplicate rows. However, this is not entirely the case, since I am using an INNER JOIN to sort the rows.

说我有一个论坛主题表,其中包含以下数据:

Say I have a table of forum threads, containing this data:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
+----+--------+-------------+

我在另一个表格中也有一组帖子,如下所示:

I also have a set of posts in another table like this:

+----+----------+--------+---------+
| id | threadid | userid | content |
+----+----------+--------+---------+
|  1 |        1 |      1 | Lorem   |
|  2 |        1 |      2 | Ipsum   |
|  3 |        2 |      2 | Test    |
|  4 |        3 |      1 | Foo     |
|  5 |        2 |      3 | Bar     |
|  6 |        3 |      5 | Bob     |
|  7 |        1 |      2 | Joe     |
+----+----------+--------+---------+

我正在使用以下MySQL查询来获取所有线程,然后根据最新帖子对它们进行排序(假设ID较高的帖子是最新的:

I am using the following MySQL query to get all threads, then sort them based on the latest post (assuming that posts with higher ids are more recent:

SELECT t.*
FROM Threads t
INNER JOIN Posts p ON t.id = p.threadid
ORDER BY p.id DESC

这有效,并生成如下内容:

This works, and generates something like this:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
|  1 |      1 | Information |
+----+--------+-------------+

但是,如您所见,该信息是正确的,但是有重复的行.我想删除这样的重复项,所以我改用SELECT DISTINCT.但是,这产生了以下内容:

However, as you can see, the information is correct, but there are duplicate rows. I'd like to remove such duplicates, so I used SELECT DISTINCT instead. However, this yielded the following:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
+----+--------+-------------+

这显然是错误的,因为信息"线程应该在最上面.似乎使用DISTINCT会导致重复项从顶部移至底部,因此仅保留最后一行.这会导致排序中出现一些问题.

This is obviously wrong, since the "Information" thread should be on top. It would seem that using DISTINCT causes the duplicates to be removed from the top to the bottom, so only the final rows are left. This causes some issues in the sorting.

是这种情况,还是我分析不正确?

Is this the case, or am I analyzing things incorrectly?

推荐答案

需要理解的两件事:

  1. 通常来说,结果集是无序的,除非您指定ORDER BY子句;否则,在您指定非严格顺序(即非-unique列),则按该顺序排列的记录在结果集中出现的顺序是不确定的.

  1. Generally speaking, resultsets are unordered unless you specify an ORDER BY clause; to the extent that you specify a non-strict order (i.e. ORDER BY over non-unique columns), the order in which records that are equal under that ordering appear within the resultset is undefined.

我怀疑您可能指定了这样的非严格顺序,这是问题的根源:在一组足以唯一标识每个记录的列上指定ORDER BY,以确保顺序严格您关心它在结果集中的最终位置.

I suspect you may be specifying such a non-strict order, which is the root of your problems: ensure that your ordering is strict by specifying ORDER BY over a set of columns that is sufficient to uniquely identify each record for which you care about its final position in the resultset.

DISTINCT可以使用GROUP BY ,这导致结果按分组的列进行排序;也就是说,SELECT DISTINCT a, b, c FROM t将产生一个结果集,该结果集看起来好像已经应用了ORDER BY a, b, c一样.同样,指定足够严格的订单以满足您的需求将覆盖此效果.

DISTINCT may use GROUP BY, which causes the results to be ordered by the grouped columns; that is, SELECT DISTINCT a, b, c FROM t will produce a resultset that appears as though ORDER BY a, b, c has been applied. Again, specifying a sufficiently strict order to meet your needs will override this effect.


考虑到我上面的第二点,在进行更新之后,很明显,将结果分组以实现DISTINCT的效果使得无法按未分组的列p.id进行排序;相反,您想要:


Following your update, bearing in mind my point #2 above, it is clear that the effect of grouping the results to achieve DISTINCT makes it impossible to then order by the non-grouped column p.id; instead, you want:

SELECT   t.*
FROM     Threads t INNER JOIN Posts p ON t.id = p.threadid
GROUP BY t.id
ORDER BY MAX(p.id) DESC

这篇关于ORDER BY适用于DISTINCT之前还是之后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆