如何在不指定完整类型的情况下将我的表中的行传递给UDF? [英] How can I pass a row from my table to a UDF without specifying the complete type?

查看:110
本文介绍了如何在不指定完整类型的情况下将我的表中的行传递给UDF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想对表格进行一些处理(例如示例Github提交),它使用 JavaScript UDF 。我可能想在UDF的迭代中更改我在UDF中查看的字段,因此我决定将整个表中的行传递给它。我的UDF最终看起来像这样:

  #standardSQL 
CREATE TEMP FUNCTION GetCommitStats(
input STRUCT< ;提交STRING,树STRING,父ARRAY STRING,
author STRUCT< name STRING,email STRING,...>>)
RETURNS STRUCT<
父ARRAY< STRING> ;,
author_name STRING,
diff_count INT64>
LANGUAGE js AS
[此处为UDF内容]
;

然后,我用如下查询调用该函数:

  SELECT GetCommitStats(t)。* 
FROM`bigquery-public-data.github_repos.sample_commits` AS t;

UDF声明中最麻烦的部分是输入结构,因为我必须包含所有的嵌套字段及其类型。有没有更好的方法来做到这一点?

com / bigquery / docs / reference / standard-sql / functions-and-operators#to_json_stringrel =nofollow noreferrer> TO_JSON_STRING 将任意结构和数组转换为JSON,然后在UDF中将其解析为对象以供进一步处理。例如,

  #standardSQL 
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
父ARRAY< STRING> ;,
author_name STRING,
diff_count INT64>
语言js AS
var row = JSON.parse(json_str);
var result = new Object();
result ['parent'] = row.parent ;
result ['author_name'] = row.author.name;
result ['diff_count'] = row.difference.length;
返回结果;
;

SELECT GetCommitStats(TO_JSON_STRING(t))。*
FROM`bigquery-public-data.github_repos.sample_commits` AS t;

如果您想减少扫描的列数,可以传递一个结构体 #standardSQL
> TO_JSON_STRING CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
父ARRAY< STRING> ;,
author_name STRING,
diff_count INT64>
语言js AS
var row = JSON.parse(json_str);
var result = new Object();
result ['parent'] = row.parent ;
result ['author_name'] = row.author.name;
result ['diff_count'] = row.difference.length;
返回结果;
;

SELECT
GetCommitStats(TO_JSON_STRING(
STRUCT(parent,author,difference)
))。*
FROM`bigquery-public-data.github_repos .sample_commits`;


Let's say that I want to do some processing on a table (such as the sample Github commits) that has a nested structure using a JavaScript UDF. I may want to change the fields that I look at in the UDF as I iterate on its implementation, so I decide just to pass entire rows from the table to it. My UDF ends up looking something like this:

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(
  input STRUCT<commit STRING, tree STRING, parent ARRAY<STRING>,
               author STRUCT<name STRING, email STRING, ...>>)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
[UDF content here]
""";

Then I call the function with a query such as:

SELECT GetCommitStats(t).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

The most cumbersome part of the UDF declaration is the input struct, since I have to include all of the nested fields and their types. Is there a better way to do this?

解决方案

You can use TO_JSON_STRING to convert arbitrary structs and arrays to JSON, then parse it inside your UDF into an object for further processing. For example,

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT GetCommitStats(TO_JSON_STRING(t)).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

If you want to cut down on the number of columns that are scanned, you can pass a struct of the relevant columns to TO_JSON_STRING instead:

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT
  GetCommitStats(TO_JSON_STRING(
    STRUCT(parent, author, difference)
  )).*
FROM `bigquery-public-data.github_repos.sample_commits`;

这篇关于如何在不指定完整类型的情况下将我的表中的行传递给UDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆