将带变量的循环转换为BigQuery SQL [英] Converting loop with variables to BigQuery SQL

查看:114
本文介绍了将带变量的循环转换为BigQuery SQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有成千上万个脚本,其中包括循环遍历数据集并使用变量进行累加.例如:

I have thousands of script that include loop over dataset and use variables for accumulation. Eg:

// assuming that 'ids' is populated from some BQ table
ids = [1, 2, 3, 4, 5]
var1 = "v1"   //initialize variable
var2 = "v2"   //initialize variable

for id in ids
    var1 = var2
    if (id > 2)
      var2 = var1 + "-" + item
    else
      var2 = id
    print(id, var1, var2)

这将产生以下输出:

1,v2,1
2,1,2
3,2,2-3
4,2-3,2-3-4
5,2-3-4,2-3-4-5

要考虑的其他几件事:

  • 脚本可以包含n个变量.
  • 变量可以充当 累加器,也可以只包含文字.
  • 循环中的条件可以是 更复杂,有多个嵌套的if/else条件.
  • 脚本也可以具有嵌套循环,该循环将循环n次或直到不满足特定条件为止.
  • Script can involve n number of variables.
  • Variables can act as accumulators or can simply have literals.
  • Condition in loop can be more complex, multiple nested if / else conditions.
  • Script can have nested loop as well which would loop for n times or till certain condition is not met.

是否可以在BigQuery SQL中支持它?我也对BigQuery UDF开放,但是UDF不支持可变状态.

Is it possible to support it in BigQuery SQL? I'm open for BigQuery UDF as well but UDF does not support mutable state.

此外,由于有成千上万个这样的脚本,因此我希望自动化转换,而不是手动完成.解析这些脚本对我来说不是问题,我只需要将其转换为等效的BigQuery SQL.让我知道如何解决这个问题.

Also since there are thousands of such scripts, instead of doing it manually, I want to automate conversion. Parsing these scripts is not an issue for me, I only need to convert this to equivalent BigQuery SQL. Let me know how I can approach this problem.

修改: 虽然我更倾向于使用仅SQL方法,但

Though I'm more inclined to use SQL only approach but BigQuery script also seems promising option.

推荐答案

我能够为上述示例脚本转换SQL查询:

I was able to convert SQL query for the aforementioned sample script:

with  t as
(select 1 as id union all
select 2 as id union all
select 3 as id union all
select 4 as id union all
select 5 as id)

select id, if(id = 1, 'v2', lag(var2) over(order by id)) var1, var2
from (
    select
      id,
      if(
        id > 2,
        string_agg(if(id > 1, concat('', '', cast(id as string)), null), '-') over (rows between unbounded preceding and current row),
        cast(id as string)
      ) var2
    from t
  )

这样,我还可以转换大多数纯SQL脚本.只是围绕它建立模型会涉及更多的问题,但至少我正在进行手动转换.

And like that I'm also able to convert most of the scripts in pure SQL. It's just building model around it is more involved but at least I'm moving ahead with manual conversions.

这篇关于将带变量的循环转换为BigQuery SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆