BigQuery UDF内部错误 [英] BigQuery UDF Internal Error
问题描述
查询失败
错误:发生内部错误,请求无法完成。
查询只是试图使用UDF来执行SHA256。
SELECT
输入AS标题,
input_sha256 AS title_sha256
FROM
SHA256 (
SELECT
title AS输入
FROM
[bigquery-public-data:hacker_news.stories]
GROUP BY
输入
)
LIMIT
1000
在线UDF粘贴在下方。不过,我无法发布完整的UDF,因为StackOverflow在帖子中投诉过多的代码。完整的UDF可以在这个要点上看到。
函数sha256(row,emit){
emit(
{
输入:row.input,
input_sha256:CryptoJS.SHA256 (row.input).toString(CryptoJS.enc.Hex)
}
);
bigquery.defineFunction(
'SHA256',//导出到SQL $ b $ ['input']的函数的名称,//输入列的名称
{'name':'input','type':'string'},
{'name':'input_sha256','type':'string'}
],
sha256 //引用JavaScript UDF
);
不确定它是否有帮助,但工作ID是
bigquery:bquijob_7fd3b51c_153c058dc7c
看起来像那里是类似的问题:
https://code.google.com/p/google-bigquery/issues/detail? id = 478
简短的回答 - 这是一个相关的问题到我通过我自己的测试发现并在今天修复的内存分配,但它需要一段时间才能流出到产品。
稍微长一点的答案 - 我们刚推出今天针对的问题是,当用户在内存不足的情况下将UDF扩展到大量行时出现问题,即使UDF可以在更少的行上获得成功。这种情况下的查询现在可以在我们的内部/测试树上正常运行。但是,由于公共BigQuery主机具有更高的流量负载,执行UDF(V8)的JavaScript引擎在生产中的行为与内部树中的行为有所不同。特别是,有一个新的内存分配错误,以前的一些OOMing作业现在正在击中,直到查询在完全加载的树上运行时才能观察到。
快速修复是一个小错误,但我们最好让它通过我们的定期测试和QA循环。这应该使生产中的修复工作在大约一周内完成,假设候选人没有任何问题。你会接受吗?
We had a simple UDF in BigQuery that somehow throws an error that keeps returning
Query Failed
Error: An internal error occurred and the request could not be completed.
The query was simply trying to use UDF to perform a SHA256.
SELECT
input AS title,
input_sha256 AS title_sha256
FROM
SHA256(
SELECT
title AS input
FROM
[bigquery-public-data:hacker_news.stories]
GROUP BY
input
)
LIMIT
1000
The in-line UDF is pasted below. However I can not post the full UDF as StackOverflow complaints too much code in the post. The full UDF can be seen this gist.
function sha256(row, emit) {
emit(
{
input: row.input,
input_sha256: CryptoJS.SHA256(row.input).toString(CryptoJS.enc.Hex)
}
);
}
bigquery.defineFunction(
'SHA256', // Name of the function exported to SQL
['input'], // Names of input columns
[
{'name': 'input', 'type': 'string'},
{'name': 'input_sha256', 'type': 'string'}
],
sha256 // Reference to JavaScript UDF
);
Not sure if it helps, but the Job-ID is
bigquery:bquijob_7fd3b51c_153c058dc7c
Looks like there is a similar issue at:
https://code.google.com/p/google-bigquery/issues/detail?id=478
Short answer - this is an issue related to memory allocation that I uncovered via my own testing and fixed today, but it will take a little while to flow out to production.
Slightly longer answer - we just rolled out a fix today for an issue where users who were having "out of memory" issues when scaling up their UDFs over larger number of rows, even though the UDF would succeed on smaller numbers of rows. The queries that were hitting that condition are now running fine on our internal / test trees. However, since public BigQuery hosts have much higher traffic loads, the JavaScript engine that executes the UDFs (V8) behaves somewhat differently in production than it does in internal trees. Specifically, there's a new memory allocation error that some of the previously OOMing jobs are now hitting that we couldn't observe until the queries ran on a fully-loaded tree.
It's a minor error with a quick fix, but we'd ideally let it flow through our regular testing and QA cycle. This should put the fix in production in about a week, assuming nothing else goes wrong with the candidate. Would that be acceptable for you?
这篇关于BigQuery UDF内部错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!