SQL vs MySQL:有关聚合操作和GROUP BY的规则 [英] SQL vs MySQL: Rules about aggregate operations and GROUP BY

查看:65
本文介绍了SQL vs MySQL:有关聚合操作和GROUP BY的规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这本书中,我正在阅读在学习数据库课程时,给出了以下使用聚合运算符的非法查询示例:

In this book I'm currently reading while following a course on databases, the following example of an illegal query using an aggregate operator is given:

找到年龄最大的水手的姓名和年龄.

请考虑以下尝试回答此查询的方法:

Consider the following attempt to answer this query:

SELECT S.sname, MAX(S.age)
FROM Sailors S

此查询的目的是不仅返回最大年龄,而且还返回 还有那个年龄的水手的名字.但是,此查询是 在SQL中是非法的-如果SELECT子句使用聚合操作,则 除非查询包含GROUP BY子句,否则它必须仅使用 聚合操作!

The intent is for this query to return not only the maximum age but also the name of the sailors having that age. However, this query is illegal in SQL--if the SELECT clause uses an aggregate operation, then it must use only aggregate operations unless the query contains a GROUP BY clause!

一段时间后,在使用MySQL进行练习时,我遇到了一个类似的问题,并且犯了一个与上述错误类似的错误.但是,MySQL并没有抱怨,只是吐出了一些表,后来发现这些表并不是我所需要的.

Some time later while doing an exercise using MySQL, I faced a similar problem, and made a mistake similar to the one mentioned. However, MySQL didn't complain and just spit out some tables which later turned out not to be what I needed.

上面的查询在SQL中是否真的非法,但在MySQL中是合法的,如果是这样,那为什么呢? 在什么情况下需要进行这样的查询?

Is the query above really illegal in SQL, but legal in MySQL, and if so, why is that? In what situation would one need to make such a query?

问题的进一步阐述:

问题不在于SELECT中提到的所有属性是否也应在GROUP BY中提及. 这就是为什么上面的查询在属性中使用属性和属性的聚合操作,而没有任何GROUP BY的情况下在MySQL中合法的原因.

The question isn't about whether or not all attributes mentioned in a SELECT should also be mentioned in a GROUP BY. It's about why the above query, using atributes together with aggregate operations on attributes, without any GROUP BY is legal in MySQL.

让我们说水手"表看起来像这样:

Let's say the Sailors table looked like this:

+----------+------+
| sname    | age  |
+----------+------+
| John Doe |   30 |
| Jane Doe |   50 |
+----------+------+

查询将返回:

+----------+------------+
| sname    | MAX(S.age) |
+----------+------------+
| John Doe |         50 |
+----------+------------+

现在谁会需要它?约翰·多伊(John Doe)不是50岁,他是30岁! 正如该书的引文中所述,这是首次尝试获得年龄最大的水手的名字和年龄,在本例中为50岁的简·多伊(Jane Doe).

Now who would need that? John Doe ain't 50, he's 30! As stated in the citation from the book, this is a first attempt to get the name and age of the oldest sailor, in this example, Jane Doe at the age of 50.

SQL会说此查询是非法的,但是MySQL只是继续进行并吐出垃圾". 谁会需要这种结果? 为什么MySQL允许新手使用这个小陷阱?

SQL would say this query is illegal, but MySQL just proceeds and spits out "garbage". Who would need this kind of result? Why does MySQL allow this little trap for newcomers?

推荐答案

基于链接,其中 a_horse_with_no_name 评论,我已经得出了自己的答案:

Based on a link which a_horse_with_no_name provided in a comment, I have arrived at my own answer:

MySQL的使用GROUP BY的方式似乎不同于SQL的方式,以便允许在功能上始终依赖于其他包含的列的情况下,从GROUP BY子句中删除列.

It seems that the MySQL way of using GROUP BY differs from the SQL way, in order to permit leaving out columns, from the GROUP BY clause, when they are functionally dependant on other included columns anyways.

让我们说我们有一个表,显示银行帐户的活动. 这不是一个深思熟虑的表,但这是我们仅有的一个表,必须这样做. 我们不去追踪金额,而是想象一个帐户从"0"开始,而是记录到该帐户的所有交易,因此该金额是交易的总和.该表可能如下所示:

Lets say we have a table displaying the activity of a bank account. It's not a very thought-out table, but it's the only one we have, and that will have to do. Instead of keeping track of an amount, we imagine an account starts at '0', and all transactions to it is recorded instead, so the amount is the sum of the transactions. The table could look like this:

+------------+----------+-------------+
| costumerID | name     | transaction |
+------------+----------+-------------+
|       1337 | h4x0r    |         101 |
|         42 | John Doe |         500 |
|       1337 | h4x0r    |        -101 |
|         42 | John Doe |        -200 |
|         42 | John Doe |         500 |
|         42 | John Doe |        -200 |
+------------+----------+-------------+

很明显,名称"在功能上取决于"costumerID". (在此示例中,另一种方法也是可行的.)

It is clear that the 'name' is functionally dependant on the 'costumerID'. (The other way around would also be possible in this example.)

如果我们想知道每个客户的客户ID,名称和当前金额怎么办?

What if we wanted to know the costumerID, name and current amount of each customer?

在这种情况下,两个非常相似的查询将返回以下正确结果:

In such a situation, two very similar queries would return the following right result:

+------------+----------+--------+
| costumerID | name     | amount |
+------------+----------+--------+
|         42 | John Doe |    600 |
|       1337 | h4x0r    |      0 |
+------------+----------+--------+

此查询可以在MySQL中执行,并且根据SQL合法.

This query can be executed in MySQL, and is legal according to SQL.

SELECT costumerID, name, SUM(transaction) AS amount
FROM Activity
GROUP BY costumerID, name

此查询可以在MySQL中执行,并且根据SQL 是合法的.

This query can be executed in MySQL, and is NOT legal according to SQL.

SELECT costumerID, name, SUM(transaction) AS amount
FROM Activity
GROUP BY costumerID

以下行将使查询返回并返回错误,因为它现在必须遵循使用聚合操作和GROUP BY的SQL方式:

The following line would make the query return and error instead, since it would now have to follow the SQL way of using aggregation operations and GROUP BY:

SET sql_mode = 'ONLY_FULL_GROUP_BY';

允许在MySQL中进行第二个查询的参数似乎是假设,在SELECT运算中提到但在GROUP BY中未提及的所有列都在聚合操作中使用(事务"的情况) ,或在功能上依赖于其他包含的列(带有名称"的情况).对于名称",我们可以确保为所有组条目选择了正确的名称",因为它在功能上依赖于"costumerID",因此,每一组costumeID可能只有一个名称.

The argument for allowing the second query in MySQL, seems to be that it is assumed that all columns mentioned in SELECT, but not mentioned in GROUP BY, are either used inside an aggregate operation, (the case with 'transaction'), or are functionally dependent on other included columns, (the case with 'name'). In the case of 'name', we can be sure that the correct 'name' is chosen for all group entries, since it is functionally dependant on 'costumerID', and therefore there is only one possibly name for each group of costumerID's.

这种使用GROUP BY的方式似乎很难克服,因为它不会对GROUP BY子句中遗漏的内容进行任何进一步的检查.人们可以从自己的SELECT语句中选择和选择列,以按照他们认为合适的方式放入GROUP BY子句,即使没有必要包含或排除任何特定的列也是如此.

This way of using GROUP BY seems flawed tough, since it doesn't do any further checks on what is left out from the GROUP BY clause. People can pick and choose columns from their SELECT statement to put in their GROUP BY clause as they see fit, even if it makes no sense to include or leave out any particular column.

Sailor示例很好地说明了此缺陷. 使用聚合运算符(可能与GROUP BY结合使用)时,返回集中的每个组条目的每个列只有一个值.对于Sailor,由于未使用GROUP BY子句,因此整个表将放入一个单独的组条目中.该条目需要一个名称和最大年龄.毫不费力地为该条目选择最大年龄,因为MAX(S.age)仅返回一个值.对于S.sname而言,仅在SELECT中提到了wich,现在,在整个Sailor表中,选择的数量就和唯一的sname一样多(本例中为John和Jane Doe). MySQL没有任何可供选择的线索,我们没有提供任何线索,并且它没有及时刹车,因此它必须选择首先出现的任何东西,(Jane Doe ).如果将两行切换,它实际上会偶然给出正确答案".似乎很愚蠢,因为在MySQL中允许这样的事情,如果GROUP BY子句中遗漏了某些东西,则使用GROUP BY的查询结果可能会取决于表的顺序.显然,这就是MySQL滚动的方式.但是它仍然不能至少有礼貌地警告我们,因为它由于有缺陷的"查询而无所适从吗?我的意思是,可以肯定的是,如果对程序给出错误的指令,它可能不会(或不应该)按照您的意愿做,但是,如果您给出的指令不明确,我当然不希望它只是开始猜测或选择最先发生的事情--_-'

The Sailor example illustrates this flaw very well. When using aggregation operators (possibly in conjunction with GROUP BY), each group entry in the returned set has only one value for each of its columns. In the case of Sailors, since the GROUP BY clause is left out, the whole table is put into one single group entry. This entry needs a name and a maximum age. Choosing a maximum age for this entry is a no-brainer, since MAX(S.age) only returns one value. In the case of S.sname though, wich is only mentioned in SELECT, there are now as many choices as there are unique sname's in the whole Sailor table, (in this case two, John and Jane Doe). MySQL doens't have any clue which to choose, we didn't give it any, and it didn't hit the brakes in time, so it has to just pick whatever comes first, (Jane Doe). If the two rows were switched, it would actually give "the right answer" by accident. It just seems plain dumb that something like this is allowed in MySQL, that the result of a query using GROUP BY could potententially depend on the ordering of the table, if something is left out in the GROUP BY clause. Apparently, that's just how MySQL rolls. But still couldn't it at least have the courtesy of warning us when it has no clue what it's doing because of a "flawed" query? I mean, sure, if you give the wrong instructions to a program, it probably wouldn't (or shouldn't) do as you want, but if you give unclear instructions, I certainly wouldn't want it to just start guessing or pick whatever comes first... -_-'

这篇关于SQL vs MySQL:有关聚合操作和GROUP BY的规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆