蜂巢:指定组的总和(HiveQL) [英] Hive: Sum over a specified group (HiveQL)

查看:67
本文介绍了蜂巢:指定组的总和(HiveQL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子:

key    product_code    cost
1      UK              20
1      US              10
1      EU              5
2      UK              3
2      EU              6

我想找到每个键"组的所有乘积之和,并追加到每一行.例如,对于键= 1,找到所有产品的成本总和(20 + 10 + 5 = 35),然后将结果附加到与键= 1对应的所有行中.因此,最终结果:

I would like to find the sum of all products for each group of "key" and append to each row. For example for key = 1, find the sum of costs of all products (20+10+5=35) and then append result to all rows which correspond to the key = 1. So end result:

key    product_code    cost     total_costs
1      UK              20       35
1      US              10       35
1      EU              5        35
2      UK              3        9
2      EU              6        9

我宁愿不使用子联接来执行此操作,因为这样效率低下.我最好的主意是将 over 函数与 sum 函数结合使用,但是我无法使其正常工作.我最好的尝试:

I would prefer to do this without using a sub-join as this would be inefficient. My best idea would be to use the over function in conjunction with the sum function but I cant get it to work. My best try:

SELECT key, product_code, sum(costs) over(PARTITION BY key)
FROM test
GROUP BY key, product_code;

IV看了docs ,但是我不知道该如何解决.我正在使用Hive v0.12.0,HDP v2.0.6,HortonWorks Hadoop发行版.

Iv had a look at the docs but there so cryptic I have no idea how to work out how to do it. Im using Hive v0.12.0, HDP v2.0.6, HortonWorks Hadoop distribution.

推荐答案

类似于@VB_答案,请使用无限制的先行和无限制的跟随

Similar to @VB_ answer, use the BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING statement.

HiveQL查询因此是:

The HiveQL query is therefore:

SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;

这篇关于蜂巢:指定组的总和(HiveQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆