BigQuery COUNT(DISTINCT值)与COUNT(值) [英] BigQuery COUNT(DISTINCT value) vs COUNT(value)

查看:356
本文介绍了BigQuery COUNT(DISTINCT值)与COUNT(值)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在bigquery中发现了一个小故障/错误。
我们在
starschema.net:clouddb:bank.Banks_token

下获得了一个基于银行统计数据的表格如果我运行以下查询:

  SELECT count(*)as totalrow,
count(DISTINCT BankId)as bankidcnt
FROM bank。 Banks_token;

我得到以下结果:

   

BigQuery,COUNT DISTINCT是所有结果大于1000的统计近似值。



您可以提供一个可选的第二个参数来给出使用近似值的阈值。因此,如果在示例中使用COUNT(DISTINCT BankId,10000),则应该看到确切的结果(因为实际的行数小于10000)。但请注意,使用较大的阈值在性能方面可能会代价高昂。

请参阅完整文档:
https://developers.google.com/bigquery/docs/query-reference#aggfunctions






更新2017:



使用BigQuery #standardSQL COUNT(DISTINCT)始终确切。对于近似结果,可使用 APPROX_COUNT_DISTINCT()。为什么会有人使用近似结果? 查看这篇文章


I found a glitch/bug in bigquery. We got a table based on Bank Statistic data under the starschema.net:clouddb:bank.Banks_token

If i run the following query:

SELECT count(*) as totalrow,
count(DISTINCT BankId ) as bankidcnt
FROM bank.Banks_token;

And i get the following result:

Row totalrow    bankidcnt    
1   9513    9903    

My problem is that if i have 9513row how could i get 9903row, which is 390row more than the rowcount in the table.

解决方案

In BigQuery, COUNT DISTINCT is a statistical approximation for all results greater than 1000.

You can provide an optional second argument to give the threshold at which approximations are used. So if you use COUNT(DISTINCT BankId, 10000) in your example, you should see the exact result (since the actual amount of rows is less than 10000). Note, however, that using a larger threshold can be costly in terms of performance.

See the complete documentation here: https://developers.google.com/bigquery/docs/query-reference#aggfunctions


UPDATE 2017:

With BigQuery #standardSQL COUNT(DISTINCT) is always exact. For approximate results use APPROX_COUNT_DISTINCT(). Why would anyone use approx results? See this article.

这篇关于BigQuery COUNT(DISTINCT值)与COUNT(值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆