“错误:查询执行期间超出资源”来自使用FLATTEN的SQL调用 [英] "Error: Resources exceeded during query execution" resulting from SQL call using FLATTEN

查看:100
本文介绍了“错误:查询执行期间超出资源”来自使用FLATTEN的SQL调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正试图在BigQuery中运行以下SQL脚本(目标是保存表),但是在启动查询而没有行返回大小限制之后,出现与容量有关的SQL处理错误。错误是常见的错误:错误:执行查询时超出资源。

  SELECT date,
Concat fullvisitorid,String(visitid))AS unique_visit_id,
visitid,
visitnumber,
fullvisitorid,
totals.pageviews,
totals.bounces,
点击次数。 page.pagepath,
hits.page.pagetitle,
device.devicecategory,
device.browser,
device.browserversion,
hits.customvariables.index,
hits.customvariables.customvarname,
hits.customvariables.customvarvalue,
hits.time
FROM(Flatten([XXXXXXXX.ga_sessions_20140711],hits.time))
WHERE hits .customvariables.index = 4
ORDER BY unique_visit_id DESC,
hits.time ASC

为此工作提供的工作ID是:ua-web-analytics:job_60fxNnmo9gZ23kaji50v3qH9dXs。我已阅读关于这些错误的主题的其他帖子,例如,因为这篇文章着重讨论资源错误观察完成加入。我怀疑现在的问题是使用FLATTEN,并且正在通过一些不同的方法进行工作。也就是说,我很担心,因为将来这个查询可能会在30或60天内一起运行(而不仅仅是我现在正在进行原型设计的那一天),这将大大增加数据量,使其超过500GB到1TB。上述查询的目标是生成一个表格,我可以将其保存并进行操作。不幸的是,以Ad Hoc的方式做这件事似乎有些问题。其他人在使用类似的SQL查询时是否遇到资源限制?对于上下文来说,被查询的表大小约为17.2 GB,只有超过一百万行。

@ Pentium10提到,设置允许较大的结果将允许您从扁平查询返回更大的结果。通常情况下,您应该使用允许大量结果的信号是,您会看到结果太大的错误。然而,查询的另一部分是无法并行: ORDER BY 操作。这是必需的吗?通常,我们发现大多数情况下当大表使用 ORDER BY 时,人们真正想要的是 ORDER BY ... LIMIT (可以高效并行地完成)。或者他们只是添加ORDER BY,因为它可以更容易地查看结果。如果您可以删除 ORDER BY ,那么随着数据大小的增加,它可能会使您的查询更快并且更好地扩展。


I am currently attempting to run the following SQL script in BigQuery (with the goal of saving the table out) but am getting a SQL processing error relating to the capacity after I start the query without a row return size limit. The error is the common one: "Error: Resources exceeded during query execution."

SELECT date, 
       Concat(fullvisitorid, String(visitid)) AS unique_visit_id, 
       visitid, 
       visitnumber, 
       fullvisitorid, 
       totals.pageviews, 
       totals.bounces, 
       hits.page.pagepath, 
       hits.page.pagetitle, 
       device.devicecategory, 
       device.browser, 
       device.browserversion, 
       hits.customvariables.index, 
       hits.customvariables.customvarname, 
       hits.customvariables.customvarvalue, 
       hits.time 
FROM   (Flatten([XXXXXXXX.ga_sessions_20140711], hits.time)) 
WHERE  hits.customvariables.index = 4 
ORDER  BY unique_visit_id DESC, 
          hits.time ASC 

The job ID that was provided for the job is: ua-web-analytics:job_60fxNnmo9gZ23kaji50v3qH9dXs. I have read the other posts on the topic of these errors such as this post which focuses on the resource errors observed completing a join. I suspect that the issue right now is with the use of FLATTEN, and am working through some different approaches. That said, I am concerned because, in future, this query may be run on 30 or 60 days together (versus just the single day that I am prototyping on right now) which will dramatically increase the data size to over 500GB to 1TB. The goal of the above query was to generate a table which I could save out and then operate on. Unfortunately, doing this in an Ad Hoc manner seems somewhat problematic. Has anyone else encountered resource constraints when using a similar SQL query? For context, the table that is being queried over is about 17.2 GB in size, with just over a million rows.

解决方案

As @Pentium10 mentioned, setting allow large results will allow you to return the larger results from the flattened query. Usually the signal that you should use "allow large results" is that you see a "result too large" error.

However, there is another part of your query that is unparallelizable: the ORDER BY operation. Is this required? Usually, we've found that most of the time when ORDER BY is used on large tables, what people really want is an ORDER BY ... LIMIT (which can be done efficiently and in parallel). Or they are just adding the ORDER BY because it makes it easier to eyeball the results. If you can drop the ORDER BY it will likely make your query faster and scale better as the data size increases.

这篇关于“错误:查询执行期间超出资源”来自使用FLATTEN的SQL调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆