过滤对crosstab()查询结果的意外影响 [英] Unexpected effect of filtering on result from crosstab() query

查看:122
本文介绍了过滤对crosstab()查询结果的意外影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 crosstab()查询,如下所示:

I have a crosstab() query like the following:

SELECT *
FROM crosstab(
 'SELECT row_name, extra1, extra2..., another_table.category, value
  FROM   table t
  JOIN   another_table ON t.field_id = another_table.field_id
  WHERE  t.field = certain_value AND t.extra1 = val1
  ORDER  BY row_name ASC',
 'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)

简化示例,实际查询确实很复杂,并且包含重要信息。上面的查询使用 table.extra1 = val1 过滤后返回 N 个结果行。

Simplified example, the actual query is really complex and contains important information. The above query returns N result rows after filtering with table.extra1 = val1.

当我如下更改查询时:

SELECT *
FROM crosstab(
 'SELECT row_name, extra1, extra2..., another_table.category, value
  FROM   table t
  JOIN   another_table ON t.field_id = another_table.field_id
  WHERE  t.field = certain_value AND t.extra1 IN (val1, ...) --> more values
  ORDER  BY row_name ASC',
 'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)
WHERE extra1 = val1; --> condition on the result

添加了更多可能的值 table.extra1 IN (val1,...)和最终条件 WHERE extra1 = val1 。现在我得到的行比原始行少了。更糟糕的是,如果我向 IN(val1,...)添加更多值,则会得到更少行。为什么?

Added more possible values table.extra1 IN (val1, ...) and a final condition WHERE extra1 = val1. Now I get fewer rows than from the original one. To make it worse, if I add yet more values to IN (val1, ...), I get yet fewer rows. Why is that?

推荐答案

extra1,extra2,... 是交叉表术语中的额外的列

tablefunc模块的手册解释了以下规则:

extra1, extra2, ... are "extra columns" in crosstab terminology.
The manual for the tablefunc module explains the rules:


它也可能有一个或多个额外列。 row_name 列必须以
首先。类别和 value 列必须是最后两列,即该顺序的
。在 row_name category 之间的任何列都将
视为额外。 对于所有具有相同 row_name 值的行,额外列均应相同。

It may also have one or more "extra" columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as "extra". The "extra" columns are expected to be the same for all rows with the same row_name value.

并进一步向下:


输出 row_name 列,以及所有额外列,都是从组的第一行复制

The output row_name column, plus any "extra" columns, are copied from the first row of the group.

我大胆强调关键部分。

您只能按 row_name 进行排序:

ORDER  BY row_name ASC

在第一个示例中使用以下内容进行过滤并不重要:

Does not matter in the first example where you filter with:

WHERE ... t.extra1 = 'val1'  -- single quotes by me

所有输入行都有 extra1 ='val1' 。但是在第二个示例中,使用以下内容进行过滤很重要:

All input row have extra1 = 'val1' anyway. But it matters in the second example where you filter with:

WHERE ... t.extra1 IN('val1', ...) --> More values

现在,额外的列违反了上面的第一个粗体要求extra1 。尽管第一个输入查询的排序顺序是不确定的,但是额外列 extra1 的结果值是任意选择的。 extra1 的可能值越多,最终具有'val1'的行越少:这就是您所观察到的。

Now, the first bolded requirement above is violated for the extra column extra1. While the sort order of the first input query is non-deterministic, resulting values for the "extra" column extra1 are picked arbitrarily. The more possible values for extra1, the fewer rows will end up having 'val1': that's what you observed.

您仍然可以使用它:对于每个至少具有以下一项的 row_name 报告 extra1 ='val1' ORDER BY 更改为:

You can still make it work: to report extra1 = 'val1' for every row_name that has at least one of those, change the ORDER BY to:

ORDER  BY row_name, (extra1 <> 'val1')

在顶部排列'val1'。 布尔值表达式的解释(带有更多链接):

Sorts 'val1' on top. Explanation for that boolean expression (with links to more):

  • PostgreSQL: Grouping then filtering table, with condition for nonexistence

在排序顺序不确定的情况下,仍然可以任意选择其他额外列。

Other "extra" columns are still chosen arbitrarily while the sort order is not deterministic.

交叉表基础知识

  • PostgreSQL Crosstab Query
  • Postgresql crosstab query with multiple "row name" columns

这篇关于过滤对crosstab()查询结果的意外影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆