在BigQuery中确定count(*)的值究竟如何? [英] How exactly is the value of count(*) determined in BigQuery?

查看:105
本文介绍了在BigQuery中确定count(*)的值究竟如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我加入了一张约70000行的表格,每个表格都有一个稍大的第二个表格。现在计数(a.business_column)和计数(*)会得出不同的结果。前者正确地报告〜70000,而后者给出~200000。但是这只发生在我单独选择count(*)的时候,当我一起选择它们时它们会给出相同的结果(〜70000)。这是可能的吗?

  select 
count(*)
/*,count(a.business_column )* /

from table_a a
内连接每个table_b b
on b.key_column = a.business_column


解决方案

更新: em> BigQuery在使用时变平字段与重复字段同名 而不是






要回答标题问题:BigQuery中的COUNT(*)始终是准确的。

需要注意的是,在SQL中COUNT(*)和COUNT(列)具有语义上不同的含义 - 查询可以用不同的方式解释。



请参阅: http://www.xaprb.com/blog/2009/04/08/the-dangerous-subtleties-of-left-join-and-count-in-sql/



他们有这样的示例查询:

  select user.userid,计数(email.subject)
来自用户
内部加入的电子邮件地址user.userid = email.userid
group by user.userid;

该查询结果不明确,文章作者将其更改为更明确的这个评论:


但是,如果这不是查询的作者意味着什么?没有
的方法可以真正知道。对于
查询,有几种可能的含义,并且有几种不同的方式可以将查询写入
,从而更清楚地表达这些含义。但原始查询是
含糊不清,原因有几个。然后每个读取这个查询
的人最终都会猜测原作者的意思。 我
认为我可以安全地将其改为...







更新: 关于这是如何工作的一步一步的解释,请参阅 BigQuery在使用与重复字段名称相同的字段时变平整


I am joining a table of about 70000 rows with a slightly bigger second table through inner join each. Now count(a.business_column) and count(*) give different results. The former correctly reports back ~70000, while the latter gives ~200000. But this only happens when I select count(*) alone, when I select them together they give the same result (~70000). How is this possible?

select
   count(*)
   /*,count(a.business_column)*/

from table_a a
inner join each table_b b
   on b.key_column = a.business_column

解决方案

UPDATE: For a step by step explanation on how this works, see BigQuery flattens when using field with same name as repeated field instead.


To answer the title question: COUNT(*) in BigQuery is always accurate.

The caveat is that in SQL COUNT(*) and COUNT(column) have semantically different meanings - and the sample query can be interpreted in different ways.

See: http://www.xaprb.com/blog/2009/04/08/the-dangerous-subtleties-of-left-join-and-count-in-sql/

There they have this sample query:

select user.userid, count(email.subject)
from user
   inner join email on user.userid = email.userid
group by user.userid;

That query turns out to be ambigous, and the article author changes it for a more explicit one, adding this comment:

But what if that’s not what the author of the query meant? There’s no way to really know. There are several possible intended meanings for the query, and there are several different ways to write the query to express those meanings more clearly. But the original query is ambiguous, for a few reasons. And everyone who reads this query afterwards will end up guessing what the original author meant. "I think I can safely change this to…"


UPDATE: For a step by step explanation on how this works, see BigQuery flattens when using field with same name as repeated field instead.

这篇关于在BigQuery中确定count(*)的值究竟如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆