一个组的SQL两个标准 [英] SQL two criteria from one group-by
问题描述
现在我想从每组重复项中选择一个,列 A
not null - 而且我已经从数据中验证了每个组最多有1个这样的行 - 如果在这个特定组中没有,那么最小的列 ID
。
如何选择?我不能在ELSE中的CASE和聚合的THEN中完全使用非聚合。例如。这不工作:
SELECT CASE
当dA为空时,d.ID
ELSE MIN (d.ID)END,
dB,
dC,
dE,
dF
FROM TABLE T
JOIN(这里的重复查询)D ON TB = DB
AND TC = DC
AND TE = DE
AND TF = DF
GROUP BY TB,
TC,
TE,
TF
错误是:
列A必须出现在GROUP BY子句中或在聚合函数中使用。
这可以是更简单的:
SELECT DISTINCT ON(b,c ,e,f)
b,c,e,f,id - 自由添加更多列
FROM(< duplicate query here>)sub
ORDER BY b,c,e,f ,(a IS NOT NULL),id
-
您的重复查询具有所有列。
-
使用标准SQL的Postgres扩展名(不需要
JOIN
code> DISTINCT :DISTINCT ON
:
-
Postgres有一个正确的布尔类型。您可以直接使用
ORDER BY
布尔表达式。 (0),TRUE
(1),NULL
(NULL)。如果a为NULL,则此表达式为FALSE
并先排序:(一个IS NOT NULL)
。其余由id
订购。 Voilá。 -
选择
ID
会自动发生。根据您的描述,您需要在此查询中选择的行的ID。没有什么可做的。 -
您可以直接将其整合到您的重复查询中。
I have a table with some "functionally duplicate" records - different IDs, but the 4 columns of "user data" (of even more columns) are identical. I've got a query working that will select all records that have such duplicates.
Now I want to select, from each group of duplicates, first any of them that have column A
not null - and I've verified from the data that there are at most 1 such rows per group - and if there are none in this particular group, then the minimum of column ID
.
How do I select that? I can't exactly use a non-aggregate in the THEN of a CASE and an aggregate in the ELSE. E.g. this doesn't work:
SELECT CASE
WHEN d.A IS NULL THEN d.ID
ELSE MIN(d.ID) END,
d.B,
d.C,
d.E,
d.F
FROM TABLE T
JOIN (my duplicate query here) D ON T.B=D.B
AND T.C=D.C
AND T.E=D.E
AND T.F=D.F
GROUP BY T.B,
T.C,
T.E,
T.F
Error being:
column A must appear in the GROUP BY clause or be used in an aggregate function.
This can be radically simpler:
SELECT DISTINCT ON (b, c, e, f)
b, c, e, f, id -- add more columns freely
FROM (<duplicate query here>) sub
ORDER BY b, c, e, f, (a IS NOT NULL), id
Your duplicate query has all columns. No need to
JOIN
to the base table again.Use the Postgres extension of the standard SQL
DISTINCT
:DISTINCT ON
:Postgres has a proper boolean type. You can
ORDER BY
boolean expression directly. The sequence isFALSE
(0),TRUE
(1),NULL
(NULL). If a is NULL, this expression isFALSE
and sorts first:(a IS NOT NULL)
. The rest is ordered byid
. Voilá.Selection of
ID
happens automatically. According to your description you want the ID of the row selected in this query. Nothing more to do.You can probably integrate this into your duplicate query directly.
这篇关于一个组的SQL两个标准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!