将带变量的循环转换为BigQuery SQL [英] Converting loop with variables to BigQuery SQL
问题描述
我有成千上万个脚本,其中包括循环遍历数据集并使用变量进行累加.例如:
I have thousands of script that include loop over dataset and use variables for accumulation. Eg:
// assuming that 'ids' is populated from some BQ table
ids = [1, 2, 3, 4, 5]
var1 = "v1" //initialize variable
var2 = "v2" //initialize variable
for id in ids
var1 = var2
if (id > 2)
var2 = var1 + "-" + item
else
var2 = id
print(id, var1, var2)
这将产生以下输出:
1,v2,1
2,1,2
3,2,2-3
4,2-3,2-3-4
5,2-3-4,2-3-4-5
要考虑的其他几件事:
- 脚本可以包含n个变量.
- 变量可以充当 累加器,也可以只包含文字.
- 循环中的条件可以是
更复杂,有多个嵌套的
if
/else
条件. - 脚本也可以具有嵌套循环,该循环将循环n次或直到不满足特定条件为止.
- Script can involve n number of variables.
- Variables can act as accumulators or can simply have literals.
- Condition in loop can be
more complex, multiple nested
if
/else
conditions. - Script can have nested loop as well which would loop for n times or till certain condition is not met.
是否可以在BigQuery SQL中支持它?我也对BigQuery UDF开放,但是UDF不支持可变状态.
Is it possible to support it in BigQuery SQL? I'm open for BigQuery UDF as well but UDF does not support mutable state.
此外,由于有成千上万个这样的脚本,因此我希望自动化转换,而不是手动完成.解析这些脚本对我来说不是问题,我只需要将其转换为等效的BigQuery SQL.让我知道如何解决这个问题.
Also since there are thousands of such scripts, instead of doing it manually, I want to automate conversion. Parsing these scripts is not an issue for me, I only need to convert this to equivalent BigQuery SQL. Let me know how I can approach this problem.
Though I'm more inclined to use SQL only approach but BigQuery script also seems promising option.
推荐答案
我能够为上述示例脚本转换SQL查询:
I was able to convert SQL query for the aforementioned sample script:
with t as
(select 1 as id union all
select 2 as id union all
select 3 as id union all
select 4 as id union all
select 5 as id)
select id, if(id = 1, 'v2', lag(var2) over(order by id)) var1, var2
from (
select
id,
if(
id > 2,
string_agg(if(id > 1, concat('', '', cast(id as string)), null), '-') over (rows between unbounded preceding and current row),
cast(id as string)
) var2
from t
)
这样,我还可以转换大多数纯SQL脚本.只是围绕它建立模型会涉及更多的问题,但至少我正在进行手动转换.
And like that I'm also able to convert most of the scripts in pure SQL. It's just building model around it is more involved but at least I'm moving ahead with manual conversions.
这篇关于将带变量的循环转换为BigQuery SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!