蜂巢:指定组的总和(HiveQL) [英] Hive: Sum over a specified group (HiveQL)
问题描述
我有一张桌子:
key product_code cost
1 UK 20
1 US 10
1 EU 5
2 UK 3
2 EU 6
我想找到每个键"组的所有乘积之和,并追加到每一行.例如,对于键= 1,找到所有产品的成本总和(20 + 10 + 5 = 35),然后将结果附加到与键= 1对应的所有行中.因此,最终结果:
I would like to find the sum of all products for each group of "key" and append to each row. For example for key = 1, find the sum of costs of all products (20+10+5=35) and then append result to all rows which correspond to the key = 1. So end result:
key product_code cost total_costs
1 UK 20 35
1 US 10 35
1 EU 5 35
2 UK 3 9
2 EU 6 9
我宁愿不使用子联接来执行此操作,因为这样效率低下.我最好的主意是将 over
函数与 sum
函数结合使用,但是我无法使其正常工作.我最好的尝试:
I would prefer to do this without using a sub-join as this would be inefficient. My best idea would be to use the over
function in conjunction with the sum
function but I cant get it to work. My best try:
SELECT key, product_code, sum(costs) over(PARTITION BY key)
FROM test
GROUP BY key, product_code;
IV看了docs ,但是我不知道该如何解决.我正在使用Hive v0.12.0,HDP v2.0.6,HortonWorks Hadoop发行版.
Iv had a look at the docs but there so cryptic I have no idea how to work out how to do it. Im using Hive v0.12.0, HDP v2.0.6, HortonWorks Hadoop distribution.
推荐答案
Similar to @VB_ answer, use the BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
statement.
HiveQL查询因此是:
The HiveQL query is therefore:
SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;
这篇关于蜂巢:指定组的总和(HiveQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!