查询执行期间资源超出 [英] Resources Exceeded during query execution

查看:22
本文介绍了查询执行期间资源超出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行连接 2 个大数据集的查询,但在查询执行错误期间遇到了超出的资源.我读过使用 Join Each 和 Group Each 时有一些变通方法,但不是那些变通方法.

I'm trying to run a query joining 2 large sets of data and I'm hitting the resources exceeded during query execution error. I've read that there are work around when using Join Each and Group Each but not what those workaround would be.

SELECT 
  year(users.firstseen) as first_year,
  month(users.firstseen) as first_month, 
  DATEDIFF(orders.timestamp,users.firstseen) as days_elapsed,
  count(orders.user_key) as count_orders
FROM 
  [project.orders] as orders
JOIN EACH
  [project.users] AS users
ON
  orders.user_key = users.user_key
WHERE orders.store = 'ios'
GROUP EACH BY 1,2,3

以下工作:

SELECT
  year(users.firstseen) as firstyear,
  month(users.firstseen) as firstmonth,
  DATEDIFF(orders.timestamp, users.firstseen) as days_elapsed,
  COUNT(users.firstseen) AS count_orders FROM [project.orders] as orders
JOIN EACH( SELECT user_key, firstseen FROM [project.users]
WHERE store_key = 'ios') as users ON orders.user_key = users.user_key
GROUP BY firstyear, firstmonth, days_elapsed
ORDER BY firstyear, firstmonth, days_elapsed

推荐答案

如果您的加入键(在本例中为 user_key)分布不均,JOIN EACH 可能会失败.例如,如果您有一个经常出现异常的 user_key,您将从处理该键的节点收到资源超出"错误.或者,您可以尝试通过在加入之前过滤掉部分用户键来对较小的用户键集运行查询.

JOIN EACH can fail if your join keys (in this case, user_key) are unevenly distributed. For example, if you have one user_key that appears abnormally often, you'll get a "resources exceeded" error from the node that handles that key. Alternatively, you could try running the query over a smaller set of user keys by filtering out some portion of the user keys before the join.

如果您有太多不同的组键,GROUP EACH BY 可能会失败.您可以尝试通过添加更多 WHERE 子句来减少连接输出,以查看是否是这种情况.

GROUP EACH BY can fail if you have too many distinct group keys. You could try whittling down the join output by adding a few more WHERE clauses in order to see if this is the case.

基本上,我建议减少对 JOIN EACH 或 GROUP EACH BY 的输入,直到您使查询正常工作,然后您将对遇到的限制有更好的了解.了解这一点后,您就可以(希望如此)构建您的查询以充分利用可用资源.

Basically, I'd recommend whittling down the inputs to either the JOIN EACH or the GROUP EACH BY until you get the query to work, and then you'll have a better sense for the limits you're running up against. Once you know that, you can (hopefully) structure your queries to get the most out of the available resources.

(顺便说一句,我们希望在不久的将来调整这些操作,以消除您可能遇到的一些限制!)

(BTW, we expect to tune these operations in the near future to remove some of the limits you may be hitting!)

这篇关于查询执行期间资源超出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆