按一种组合分组,并再次将其分组为其他项目 [英] Group by one combination and once again group it for other items
问题描述
$ b 我们有以下数据,我们需要以下输出。
CUSTOMER_NAME PRODUCT_NAME PRICE OCEURANCE ID
customer1,product1,20,1
customer1,product2,30,2
customer1,product1,25,3
customer1,product1, 20,1
customer1,product2,20,2
customer1,product2,30,2
首先,我们需要按发生次数平均价格。
customer1,product1,20(AVG为20 1),1
customer1,product1,25(AVG为25,发生3),3
现在我们再次通过客户名,产品名称(产品名称忽略占用)
最终输出custoemr1,product1,所有发生的平均价格。
customer1,product1,20 + 25/2 = 22.5
基本上如何做avera的平均值ge HIVE中?我们无法为此写任何东西。
解决方案您可以使用嵌套查询来实现这一点 如下所示: 第一步:通过occurrence_id计算初始价格平均值
SELECT customer_name,product_name,occuprance_id,avg(price)as avg_of_current_occurance
FROM customer_info
GROUP BY customer_name,product_name,occurance_id;
第二步:计算第一步返回的avg的平均值
hive(默认)>
> SELECT customer_name,product_name,avg(avg_of_current_occurance)as final_avg
> FROM(
> SELECT customer_name,product_name,occuprance_id,avg(price)as avg_of_current_occurance
> FROM customer_info
> GROUP BY customer_name,product_name,occurance_id
>)W
> GROUP BY customer_name,product_name;
总计MapReduce工作= 1
启动作业1满分1
执行成功完成
客户名称product_name final_avg
客户1 product1 22.5
customer1 product2 26.666666666666668
Folks,
We have following data and we need following output.
CUSTOMER_NAME PRODUCT_NAME PRICE OCCURANCE ID
customer1, product1, 20, 1
customer1, product2, 30, 2
customer1, product1, 25, 3
customer1, product1, 20, 1
customer1, product2, 20, 2
customer1, product2, 30, 2
First we need to average the price by occurance id.
customer1,product1,20 (AVG is 20 for occurance 1), 1
customer1,product1,25 (AVG is 25 for occurance 3) , 3
Now once again we have to average it by customername,product name (Occurance is ignored in group by)
Final Output custoemr1,product1,avg price of all occurances.
customer1,product1, 20 + 25/2 = 22.5
Basically how to do average of average in HIVE ? We are not able to write anything for this.
Hi this can be achieved using nested queries as follows :
First step : to calculate initial averages of price by occurrence_id
SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
FROM customer_info
GROUP BY customer_name,product_name,occurance_id ;
Second Step : calculate the avg of avgs returned in first step
hive (default)>
> SELECT customer_name, product_name,avg(avg_of_current_occurance) as final_avg
> FROM(
> SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
> FROM customer_info
> GROUP BY customer_name,product_name,occurance_id
> ) W
> GROUP BY customer_name,product_name;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Execution completed successfully
customer_name product_name final_avg
customer1 product1 22.5
customer1 product2 26.666666666666668
这篇关于按一种组合分组,并再次将其分组为其他项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!