过滤对crosstab()查询结果的意外影响 [英] Unexpected effect of filtering on result from crosstab() query
问题描述
我有一个 crosstab()
查询,如下所示:
I have a crosstab()
query like the following:
SELECT *
FROM crosstab(
'SELECT row_name, extra1, extra2..., another_table.category, value
FROM table t
JOIN another_table ON t.field_id = another_table.field_id
WHERE t.field = certain_value AND t.extra1 = val1
ORDER BY row_name ASC',
'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)
简化示例,实际查询确实很复杂,并且包含重要信息。上面的查询使用 table.extra1 = val1
过滤后返回 N 个结果行。
Simplified example, the actual query is really complex and contains important information. The above query returns N result rows after filtering with table.extra1 = val1
.
当我如下更改查询时:
SELECT *
FROM crosstab(
'SELECT row_name, extra1, extra2..., another_table.category, value
FROM table t
JOIN another_table ON t.field_id = another_table.field_id
WHERE t.field = certain_value AND t.extra1 IN (val1, ...) --> more values
ORDER BY row_name ASC',
'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)
WHERE extra1 = val1; --> condition on the result
添加了更多可能的值 table.extra1 IN (val1,...)
和最终条件 WHERE extra1 = val1
。现在我得到的行比原始行少了。更糟糕的是,如果我向 IN(val1,...)
添加更多值,则会得到更少行。为什么?
Added more possible values table.extra1 IN (val1, ...)
and a final condition WHERE extra1 = val1
. Now I get fewer rows than from the original one. To make it worse, if I add yet more values to IN (val1, ...)
, I get yet fewer rows. Why is that?
推荐答案
extra1,extra2,...
是交叉表术语中的额外的列 。
tablefunc模块的手册解释了以下规则:
extra1, extra2, ...
are "extra columns" in crosstab terminology.
The manual for the tablefunc module explains the rules:
它也可能有一个或多个额外列。
row_name
列必须以
首先。类别和value
列必须是最后两列,即该顺序的
。在row_name
和category
之间的任何列都将
视为额外。 对于所有具有相同row_name
值的行,额外列均应相同。
It may also have one or more "extra" columns. The
row_name
column must be first. The category andvalue
columns must be the last two columns, in that order. Any columns betweenrow_name
andcategory
are treated as "extra". The "extra" columns are expected to be the same for all rows with the samerow_name
value.
并进一步向下:
输出
row_name
列,以及所有额外列,都是从组的第一行复制。
The output
row_name
column, plus any "extra" columns, are copied from the first row of the group.
我大胆强调关键部分。
您只能按 row_name
进行排序:
ORDER BY row_name ASC
在第一个示例中使用以下内容进行过滤并不重要:
Does not matter in the first example where you filter with:
WHERE ... t.extra1 = 'val1' -- single quotes by me
所有输入行都有 extra1 ='val1'
。但是在第二个示例中,使用以下内容进行过滤很重要:
All input row have extra1 = 'val1'
anyway. But it matters in the second example where you filter with:
WHERE ... t.extra1 IN('val1', ...) --> More values
现在,额外的列违反了上面的第一个粗体要求extra1
。尽管第一个输入查询的排序顺序是不确定的,但是额外列 extra1
的结果值是任意选择的。 extra1
的可能值越多,最终具有'val1'的行越少:这就是您所观察到的。
Now, the first bolded requirement above is violated for the extra column extra1
. While the sort order of the first input query is non-deterministic, resulting values for the "extra" column extra1
are picked arbitrarily. The more possible values for extra1
, the fewer rows will end up having 'val1': that's what you observed.
您仍然可以使用它:对于每个至少具有以下一项的 row_name
报告 extra1 ='val1'
将 ORDER BY
更改为:
You can still make it work: to report extra1 = 'val1'
for every row_name
that has at least one of those, change the ORDER BY
to:
ORDER BY row_name, (extra1 <> 'val1')
在顶部排列'val1'。 布尔值
表达式的解释(带有更多链接):
Sorts 'val1' on top. Explanation for that boolean
expression (with links to more):
- PostgreSQL: Grouping then filtering table, with condition for nonexistence
在排序顺序不确定的情况下,仍然可以任意选择其他额外列。
Other "extra" columns are still chosen arbitrarily while the sort order is not deterministic.
交叉表基础知识
- PostgreSQL Crosstab Query
- Postgresql crosstab query with multiple "row name" columns
这篇关于过滤对crosstab()查询结果的意外影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!