BigQuery UDF内部错误 [英] BigQuery UDF Internal Error

查看:124
本文介绍了BigQuery UDF内部错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在BigQuery中有一个简单的UDF,它以某种方式抛出一个不断返回的错误

 查询失败
错误:发生内部错误,请求无法完成。

查询只是试图使用UDF来执行SHA256。

  SELECT 
输入AS标题,
input_sha256 AS title_sha256
FROM
SHA256 (
SELECT
title AS输入
FROM
[bigquery-public-data:hacker_news.stories]
GROUP BY
输入

LIMIT
1000

在线UDF粘贴在下方。不过,我无法发布完整的UDF,因为StackOverflow在帖子中投诉过多的代码。完整的UDF可以在这个要点上看到。

 函数sha256(row,emit){
emit(
{
输入:row.input,
input_sha256:CryptoJS.SHA256 (row.input).toString(CryptoJS.enc.Hex)
}
);


bigquery.defineFunction(
'SHA256',//导出到SQL $ b $ ['input']的函数的名称,//输入列的名称
{'name':'input','type':'string'},
{'name':'input_sha256','type':'string'}
],
sha256 //引用JavaScript UDF
);

不确定它是否有帮助,但工作ID是

  bigquery:bquijob_7fd3b51c_153c058dc7c 

看起来像那里是类似的问题:

  https://code.google.com/p/google-bigquery/issues/detail? id = 478 


解决方案

简短的回答 - 这是一个相关的问题到我通过我自己的测试发现并在今天修复的内存分配,但它需要一段时间才能流出到产品。



稍微长一点的答案 - 我们刚推出今天针对的问题是,当用户在内存不足的情况下将UDF扩展到大量行时出现问题,即使UDF可以在更少的行上获得成功。这种情况下的查询现在可以在我们的内部/测试树上正常运行。但是,由于公共BigQuery主机具有更高的流量负载,执行UDF(V8)的JavaScript引擎在生产中的行为与内部树中的行为有所不同。特别是,有一个新的内存分配错误,以前的一些OOMing作业现在正在击中,直到查询在完全加载的树上运行时才能观察到。

快速修复是一个小错误,但我们最好让它通过我们的定期测试和QA循环。这应该使生产中的修复工作在大约一周内完成,假设候选人没有任何问题。你会接受吗?


We had a simple UDF in BigQuery that somehow throws an error that keeps returning

Query Failed
Error: An internal error occurred and the request could not be completed.

The query was simply trying to use UDF to perform a SHA256.

SELECT
  input AS title,
  input_sha256 AS title_sha256
FROM
  SHA256(
      SELECT
        title AS input
      FROM
        [bigquery-public-data:hacker_news.stories]
      GROUP BY
        input 
  )
LIMIT
  1000

The in-line UDF is pasted below. However I can not post the full UDF as StackOverflow complaints too much code in the post. The full UDF can be seen this gist.

function sha256(row, emit) {
  emit(
      {
        input: row.input,
        input_sha256: CryptoJS.SHA256(row.input).toString(CryptoJS.enc.Hex)
      }
  );
}

bigquery.defineFunction(
  'SHA256',                           // Name of the function exported to SQL
  ['input'],                    // Names of input columns
  [
      {'name': 'input', 'type': 'string'},
      {'name': 'input_sha256', 'type': 'string'}
  ],
  sha256                       // Reference to JavaScript UDF
);

Not sure if it helps, but the Job-ID is

bigquery:bquijob_7fd3b51c_153c058dc7c

Looks like there is a similar issue at:

https://code.google.com/p/google-bigquery/issues/detail?id=478

解决方案

Short answer - this is an issue related to memory allocation that I uncovered via my own testing and fixed today, but it will take a little while to flow out to production.

Slightly longer answer - we just rolled out a fix today for an issue where users who were having "out of memory" issues when scaling up their UDFs over larger number of rows, even though the UDF would succeed on smaller numbers of rows. The queries that were hitting that condition are now running fine on our internal / test trees. However, since public BigQuery hosts have much higher traffic loads, the JavaScript engine that executes the UDFs (V8) behaves somewhat differently in production than it does in internal trees. Specifically, there's a new memory allocation error that some of the previously OOMing jobs are now hitting that we couldn't observe until the queries ran on a fully-loaded tree.

It's a minor error with a quick fix, but we'd ideally let it flow through our regular testing and QA cycle. This should put the fix in production in about a week, assuming nothing else goes wrong with the candidate. Would that be acceptable for you?

这篇关于BigQuery UDF内部错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆