“错误:查询执行期间资源超出"由使用 FLATTEN 的 SQL 调用产生 [英] "Error: Resources exceeded during query execution" resulting from SQL call using FLATTEN

查看:15
本文介绍了“错误:查询执行期间资源超出"由使用 FLATTEN 的 SQL 调用产生的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试在 BigQuery 中运行以下 SQL 脚本(目的是将表保存出来),但是在我开始没有行返回大小限制的查询后,我收到了与容量相关的 SQL 处理错误.错误是常见的错误:错误:查询执行期间资源超出."

I am currently attempting to run the following SQL script in BigQuery (with the goal of saving the table out) but am getting a SQL processing error relating to the capacity after I start the query without a row return size limit. The error is the common one: "Error: Resources exceeded during query execution."

SELECT date, 
       Concat(fullvisitorid, String(visitid)) AS unique_visit_id, 
       visitid, 
       visitnumber, 
       fullvisitorid, 
       totals.pageviews, 
       totals.bounces, 
       hits.page.pagepath, 
       hits.page.pagetitle, 
       device.devicecategory, 
       device.browser, 
       device.browserversion, 
       hits.customvariables.index, 
       hits.customvariables.customvarname, 
       hits.customvariables.customvarvalue, 
       hits.time 
FROM   (Flatten([XXXXXXXX.ga_sessions_20140711], hits.time)) 
WHERE  hits.customvariables.index = 4 
ORDER  BY unique_visit_id DESC, 
          hits.time ASC 

为作业提供的作业 ID 是:ua-web-analytics:job_60fxNnmo9gZ23kaji50v3qH9dXs.我已经阅读了有关这些错误主题的其他帖子,例如 作为这篇关注资源错误的帖子观察到完成连接. 我怀疑现在的问题是使用 FLATTEN,并且正在尝试一些不同的方法.也就是说,我很担心,因为在未来,这个查询可能会在 30 或 60 天内一起运行(而不是我现在正在制作原型的一天),这将使数据大小显着增加到超过 500GB 到 1TB.上述查询的目标是生成一个表,我可以将其保存然后进行操作.不幸的是,以 Ad Hoc 方式执行此操作似乎有些问题.有没有其他人在使用类似的 SQL 查询时遇到资源限制?就上下文而言,正在查询的表的大小约为 17.2 GB,只有超过一百万行.

The job ID that was provided for the job is: ua-web-analytics:job_60fxNnmo9gZ23kaji50v3qH9dXs. I have read the other posts on the topic of these errors such as this post which focuses on the resource errors observed completing a join. I suspect that the issue right now is with the use of FLATTEN, and am working through some different approaches. That said, I am concerned because, in future, this query may be run on 30 or 60 days together (versus just the single day that I am prototyping on right now) which will dramatically increase the data size to over 500GB to 1TB. The goal of the above query was to generate a table which I could save out and then operate on. Unfortunately, doing this in an Ad Hoc manner seems somewhat problematic. Has anyone else encountered resource constraints when using a similar SQL query? For context, the table that is being queried over is about 17.2 GB in size, with just over a million rows.

推荐答案

正如@Pentium10 提到的,设置允许大结果将允许您从扁平化查询中返回更大的结果.通常,您应该使用允许大结果"的信号是您看到结果太大"错误.

As @Pentium10 mentioned, setting allow large results will allow you to return the larger results from the flattened query. Usually the signal that you should use "allow large results" is that you see a "result too large" error.

但是,查询的另一部分是无与伦比的:ORDER BY 操作.这是必需的吗?通常,我们发现大多数时候当 ORDER BY 用于大表时,人们真正想要的是 ORDER BY ... LIMIT(可以是高效并行地完成).或者他们只是添加 ORDER BY,因为它可以更容易地观察结果.如果您可以删除 ORDER BY,它可能会使您的查询更快,并且随着数据大小的增加而扩展得更好.

However, there is another part of your query that is unparallelizable: the ORDER BY operation. Is this required? Usually, we've found that most of the time when ORDER BY is used on large tables, what people really want is an ORDER BY ... LIMIT (which can be done efficiently and in parallel). Or they are just adding the ORDER BY because it makes it easier to eyeball the results. If you can drop the ORDER BY it will likely make your query faster and scale better as the data size increases.

这篇关于“错误:查询执行期间资源超出"由使用 FLATTEN 的 SQL 调用产生的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆