为什么PostgreSQL中的聚合函数不能使用布尔数据类型 [英] Why aggregate functions in PostgreSQL do not work with boolean data type

查看:173
本文介绍了为什么PostgreSQL中的聚合函数不能使用布尔数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么我们不能在聚合函数中使用布尔值,而不先转换为某种整数类型?在许多情况下,从布尔数据类型的列计算总和,平均值或相关性是非常有意义的。



请考虑以下示例,其中布尔输入必须始终转换为 int 以使其工作:

  select 
sum (boolinput :: int),
avg(boolinput :: int),
max(boolinput :: int),
min(boolinput :: int),
stddev :: int),
corr(boolinput :: int,boolinputb :: int)

(select
(random()> .5):: boolean as boolinput ,
(random()> .5):: boolean as boolinputB
from
generate_series(1,100)
)a
/ pre>

PostgreSQL 文档:


true状态的有效文字值是:
TRUE't' true''y''yes''on''1'



对于false状态,可以使用以下值:
FALSE'f ''false''n''no''''0'


/ code> equals 1 FALSE



允许聚合中的布尔值也会产生一些有趣的副作用 - 我们可以简化许多case语句:



当前版本(干净,易于理解):

  > 50 then 1 else 0 end)from generate_series(1,100)gs; 

使用旧式的转换操作符 ::

 从generate_series(1,100)gs中选择sum((gs> 50) 

直接聚合布尔值(目前不工作):

 从generate_series(1,100)gs中选择sum(gs> 50) 

在其他DBMS中是否可以直接聚合布尔值?为什么在PostgreSQL中不可能?

解决方案


因为定义TRUE等于1,FALSE等于0我不明白为什么强制转换是必要的。


根据你在问题中引用的文档, 1为TRUE,0为FALSE。在C中它不是真的,其中TRUE是任何非零的。



对于这一点,也不是在这方面模仿C的语言,许多。也不适用于Ruby之类的语言,其中任何非Nil /非False的值都为True,包括零和空字符串。也不是POSIX shell及其变体,其中测试返回码如果为零,则返回TRUE,对于任何非零,返回FALSE。



是一个布尔值,具有从一个平台到下一个平台的各种丰富多彩的实现细节;不是整数。



不清楚你是如何期望Postgres平均true / false值。我很怀疑,许多平台会产生一个结果。



甚至求和布尔是尴尬:期望Postgres OR或输入值,或计数为真值

无论如何,有一些布尔聚合函数, bool_or() bool_and()。这些替换了更标准的 any() some()。 Postgres偏离标准的原因是由于潜在的模糊性。根据文档:

  SELECT b1 = ANY((SELECT b2 FROM t2 ...))FROM t1 ...; 




在这里,ANY可以被视为引入子查询,如果子查询返回一行带有布尔值的聚合函数。


http://www.postgresql.org/docs/current/static/functions-aggregate.html


Why we cannot use boolean values in aggregate functions without casting to some integer type first? In many cases it makes perfect sense to calculate sum, average or correlation from columns of boolean data type.

Consider the following example where boolean input has to be always casted to int in order to make it work:

select
   sum(boolinput::int),
   avg(boolinput::int),
   max(boolinput::int),
   min(boolinput::int),
   stddev(boolinput::int),
   corr(boolinput::int,boolinputb::int)   
from
   (select 
      (random() > .5)::boolean as boolinput,
      (random() > .5)::boolean as boolinputB 
    from 
      generate_series(1,100)
   ) a

From PostgreSQL documentation:

Valid literal values for the "true" state are: TRUE 't' 'true' 'y' 'yes' 'on' '1'

For the "false" state, the following values can be used: FALSE 'f' 'false' 'n' 'no' 'off' '0'

Because by definition TRUE equals 1 and FALSE equals 0 I do not understand why casting is necessary.

Allowing boolean in aggregation would have also interesting side effects - we can for example simplify many case statements:

Current version (clean and easy to understand):

select sum(case when gs > 50 then 1 else 0 end) from generate_series(1,100) gs;

Using old fashioned casting operator :::

select sum((gs > 50)::int) from generate_series(1,100) gs;

Direct aggregation of boolean values (not working currently):

select sum(gs > 50) from generate_series(1,100) gs;

Is direct aggregation of boolean values possible in other DBMSs? Why this is not possible in PostgreSQL?

解决方案

Because by definition TRUE equals 1 and FALSE equals 0 I do not understand why casting is necessary.

Per the docs you have quoted in your question, a boolean is not, by definition, 1 for TRUE and 0 for FALSE. It's not true in C either, where TRUE is anything non-zero.

For that matter, nor is it for languages that mimic C in this respect, of which there are many. Nor is it for languages such as Ruby, where anything non-Nil/non-False evaluates to True, including zero and empty strings. Nor is it for POSIX shell and variations thereof, where testing a return code yields TRUE if it is zero, and FALSE for anything non-zero.

Point is, a boolean is a boolean, with all sorts of colorful implementation details from a platform to the next; not an integer.

It's unclear how you were expecting Postgres to average true/false values. I'm suspicious that many if any platform will yield a result for that.

Even summing booleans is awkward: would expecting Postgres to OR the input values, or to count TRUE values?

At any rate, there are some boolean aggregate functions, namely bool_or() and bool_and(). These replace the more standard any() and some(). The reason Postgres deviates from the standard here is due to potential ambiguity. Per the docs:

SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;

Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value.

http://www.postgresql.org/docs/current/static/functions-aggregate.html

这篇关于为什么PostgreSQL中的聚合函数不能使用布尔数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆