添加“计算列"无需重复计算即可进行 BigQuery 查询 [英] Adding a "calculated column" to BigQuery query without repeating the calculations

查看:22
本文介绍了添加“计算列"无需重复计算即可进行 BigQuery 查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在新的第三列中重新使用计算列的值.例如,此查询有效:

I want to resuse value of calculated columns in a new third column. For example, this query works:

select
  countif(cond1) as A,
  countif(cond2) as B,
  countif(cond1)/countif(cond2) as prct_pass
  From 
  Where
  Group By

但是当我尝试使用 A,B 而不是重复计数时,它不起作用,因为 A 和 B 是 无效:

But when I try to use A,B instead of repeating the countif, it doesn't work because A and B are invalid:

select
  countif(cond1) as A,
  countif(cond2) as B,
  A/B as prct_pass
  From 
  Where
  Group By

我可以以某种方式使更具可读性的第二个版本工作吗?这是第一个效率低下吗?

Can I somehow make the more readable second version work ? Is this first one inefficient ?

推荐答案

你应该构造一个子查询(即双选),如

You should construct a subquery (i.e. a double select) like

SELECT A, B, A/B as prct_pass 
FROM 
(
SELECT countif(cond1) as A, 
       countif(cond2) as B 
       FROM <yourtable>
)

将在两个查询中处理相同数量的数据.在子查询一中,您将只执行 2 个 countif(),如果该步骤需要很长时间,那么执行 2 而不是 4 确实应该更有效.

The same amount of data will be processed in both queries. In the subquery one you will do only 2 countif(), in case that step takes a long time then doing 2 instead of 4 should be more efficient indeed.

看一个使用 bigquery 公共数据集的例子:

Looking at an example using bigquery public datasets:

SELECT 
countif(homeFinalRuns>3) as A,
countif(awayFinalRuns>3) as B,
countif(homeFinalRuns>3)/countif(awayFinalRuns>3) as division 
FROM `bigquery-public-data.baseball.games_post_wide`  

SELECT A, B, A/B as division FROM 
(
SELECT countif(homeFinalRuns>3) as A, 
       countif(awayFinalRuns>3) as B 
       FROM `bigquery-public-data.baseball.games_post_wide`  
)

我们可以看到一体完成(没有子查询)实际上稍微快一点.(我针对不等式的不同值运行了 6 次查询,5 次快,1 次慢)

we can see that doing all in one (without a subquery) is actually slightly faster. (I ran the queries 6 times for different values of the inequality, 5 times was faster and one time slower)

无论如何,效率将取决于计算特定数据集中条件的负担.

In any case, the efficiency will depend on how taxing is to compute the condition in your particular dataset.

这篇关于添加“计算列"无需重复计算即可进行 BigQuery 查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆