在SQLite GROUP BY查询的SELECT列表中包含额外的列是否安全? [英] Is it safe to include extra columns in the SELECT list of a SQLite GROUP BY query?

查看:136
本文介绍了在SQLite GROUP BY查询的SELECT列表中包含额外的列是否安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的SQLite表,名为message:

  sequence INTEGER PRIMARY KEY 
type TEXT
content TEXT

我想获取每种类型的最后一条消息的内容(由其序列)。令我惊讶的是,以下简单的查询工作:

$ p $ SELECT MAX(sequence),type,content
FROM message
GROUP BY类型

惊奇,因为我知道MSSQL或Postgres会拒绝包含一列在SELECT列表中,它不是GROUP BY子句或聚合函数的一部分,我必须进行连接,如下所示:

  SELECT m.sequence,m.type,m.content 
FROM

SELECT MAX(sequence)as sequence,type
FROM message
GROUP BY类型
)g
JOIN消息m
ON g.sequence = m.message_sequence

我的问题是:在SQLite中使用第一个更简单的查询形式是否安全?直观地说,它选择与MAX(序列)值相匹配的内容值,但文档似乎没有谈论这一点。当然,如果序列不是唯一的,那么结果将是不确定的。但是,如果顺序是唯一的,就像我的情况一样,这是保证还是仅仅是一个幸运的实现细节,可能会发生变化? 解决方案

p>您可以安全地使用这些查询,也就是说,如果列中的额外列在功能上依赖于您分组的列,则不会产生含糊不清的结果。

  SELECT c.parent_id,COUNT(*),p.any_column 
FROM child_table c
JOIN parent_table p USING(parent_id)
GROUP BY c.parent_id;

上面的例子可以在SQLite中工作,并产生一个明确的结果,因为没有办法 p.any_column 可以有多个值。然而,这个查询严格违反了SQL标准,并且大多数品牌的RDBMS会引发错误。

编写查询会产生不明确的结果太容易了,虽然。当您为每个组命名一个具有多个值的列时,无法控制结果集中返回的值。



实际上,MySQL从第一行返回物理存储的值,SQLite从最后返回值行。但它完全依赖于实现并且不可靠。如果任一软件的下一版本更改其内部,则升级后可能会得到不同的查询结果。所以最好不要依赖这种行为。






关于您的示例,其中 content 应该直观地从 sequence 是MAX的那一行有值。但这真的很直观吗?考虑这些其他情况:

  SELECT MAX(sequence),MIN(sequence),type,content 
FROM message
GROUP BY类型

现在哪一行为提供内容值序列为MAX的行,或者序列为MIN的行



如果使用非唯一列(例如 date ),并且有多行具有相同的MAX值(日期,但是 content 不同值

  SELECT MAX(date),type,content 
FROM message
GROUP BY类型

其他集合函数如 AVG() SUM()怎么样?这可能是聚合的值对应于表中没有单独的行。那么现在哪一行应该为 content ?提供值

  SELECT AVG (序列),类型,内容
FROM消息
GROUP BY类型


I have a simple SQLite table called "message":

sequence INTEGER PRIMARY KEY
type TEXT
content TEXT

I want to get the content of the last message of each type (as determined by its sequence). To my surprise, the following simple query works:

SELECT MAX(sequence), type, content
FROM message
GROUP BY type

Surprise, because I know that MSSQL or Postgres would refuse to include a column in the SELECT list that is not part of the GROUP BY clause or an aggregate function and I'd have to do a join, like this:

SELECT m.sequence, m.type, m.content
FROM
(
    SELECT MAX(sequence) as sequence, type
    FROM message
    GROUP BY type
) g
JOIN message m
ON g.sequence = m.message_sequence

My question is: is it safe to use the first, much simpler, form of the query in SQLite? It intuitively makes sense that it selects the "content" value that matches the "MAX(sequence)" value, but the documentation doesn't seem to talk about this at all. Of course, if sequence was not unique then the result would be undefined. But if sequence is unique, as in my case, is this guaranteed or is it simply a lucky implementation detail that's subject to change?

解决方案

You can use these queries "safely," that is, without getting ambiguous results, if the extra columns are functionally dependent on the column(s) you group by:

SELECT c.parent_id, COUNT(*), p.any_column
FROM child_table c 
JOIN parent_table p USING (parent_id)
GROUP BY c.parent_id;

The example above would work in SQLite, and produce an unambiguous result, because there's no way p.any_column could have multiple values per group. However, this query is strictly in violation of the SQL standard, and most brands of RDBMS would raise an error.

It's too easy to write a query that produces ambiguous results, though. When you name a column that has multiple values per group, you can't control which value is returned in your result set.

In practice, MySQL returns the value from the first row with respect to physical storage, and SQLite returns the value from the last row. But it's totally implementation-dependent and not reliable. If the next version of either software changes its internals, you could get different query results after you upgrade. So it's best not to rely on this behavior.


Regarding your example, where content should "intuitively" have the value from the row where sequence is MAX. But is this really intuitive? Consider these other cases:

SELECT MAX(sequence), MIN(sequence), type, content
FROM message
GROUP BY type

So which row now supplies the value for content? The row where sequence is MAX, or the row where sequence is MIN?

What if you use a non-unique column (e.g. date), and there are multiple rows with the same MAX value for date, but different values for content?

SELECT MAX(date), type, content
FROM message
GROUP BY type

What about other aggregate functions like AVG() or SUM()? It could be that the value of the aggregate corresponds to no individual row in the table. So now which row should supply the value for content?

SELECT AVG(sequence), type, content
FROM message
GROUP BY type

这篇关于在SQLite GROUP BY查询的SELECT列表中包含额外的列是否安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆