资源超出查询执行期间 [英] Resources Exceeded during query execution

查看:88
本文介绍了资源超出查询执行期间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图运行查询连接2个大型数据集,并且在查询执行过程中遇到资源超出错误。我读过使用Join Each和Group Each时有解决方法,但没有解决这些解决方法。

I'm trying to run a query joining 2 large sets of data and I'm hitting the resources exceeded during query execution error. I've read that there are work around when using Join Each and Group Each but not what those workaround would be.

SELECT 
  year(users.firstseen) as first_year,
  month(users.firstseen) as first_month, 
  DATEDIFF(orders.timestamp,users.firstseen) as days_elapsed,
  count(orders.user_key) as count_orders
FROM 
  [project.orders] as orders
JOIN EACH
  [project.users] AS users
ON
  orders.user_key = users.user_key
WHERE orders.store = 'ios'
GROUP EACH BY 1,2,3

编辑:以下工作:

the following worked:

SELECT
  year(users.firstseen) as firstyear,
  month(users.firstseen) as firstmonth,
  DATEDIFF(orders.timestamp, users.firstseen) as days_elapsed,
  COUNT(users.firstseen) AS count_orders FROM [project.orders] as orders
JOIN EACH( SELECT user_key, firstseen FROM [project.users]
WHERE store_key = 'ios') as users ON orders.user_key = users.user_key
GROUP BY firstyear, firstmonth, days_elapsed
ORDER BY firstyear, firstmonth, days_elapsed


推荐答案

这种情况下,user_key)分布不均匀。例如,如果您有一个user_key经常出现异常,则会从处理该密钥的节点中收到资源超出错误。或者,您可以尝试在一组较小的用户密钥上运行查询,方法是在加入之前过滤掉部分用户密钥。

JOIN EACH can fail if your join keys (in this case, user_key) are unevenly distributed. For example, if you have one user_key that appears abnormally often, you'll get a "resources exceeded" error from the node that handles that key. Alternatively, you could try running the query over a smaller set of user keys by filtering out some portion of the user keys before the join.

GROUP EACH BY可能会失败,如果您有太多不同的组密钥。你可以通过添加更多的WHERE子句来减少连接输出,以查看是否属实。

GROUP EACH BY can fail if you have too many distinct group keys. You could try whittling down the join output by adding a few more WHERE clauses in order to see if this is the case.

基本上,我建议减少输入加入JOIN EACH或GROUP EACH BY,直到您的查询生效,然后您将更好地了解您遇到的限制。一旦你知道这一点,你可以(希望)构建你的查询,以充分利用可用资源。

Basically, I'd recommend whittling down the inputs to either the JOIN EACH or the GROUP EACH BY until you get the query to work, and then you'll have a better sense for the limits you're running up against. Once you know that, you can (hopefully) structure your queries to get the most out of the available resources.

(顺便说一下,我们期望在近期调整这些操作未来去除你可能会遇到的一些限制!)

(BTW, we expect to tune these operations in the near future to remove some of the limits you may be hitting!)

这篇关于资源超出查询执行期间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆