按一种组合分组,并再次将其分组为其他项目 [英] Group by one combination and once again group it for other items

查看:141
本文介绍了按一种组合分组,并再次将其分组为其他项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


$ b 我们有以下数据,我们需要以下输出。

  CUSTOMER_NAME PRODUCT_NAME PRICE OCEURANCE ID 
customer1,product1,20,1
customer1,product2,30,2
customer1,product1,25,3
customer1,product1, 20,1
customer1,product2,20,2
customer1,product2,30,2

首先,我们需要按发生次数平均价格。

  customer1,product1,20(AVG为20 1),1 
customer1,product1,25(AVG为25,发生3),3

现在我们再次通过客户名,产品名称(产品名称忽略占用)

最终输出custoemr1,product1,所有发生的平均价格。

  customer1,product1,20 + 25/2 = 22.5 

基本上如何做avera的平均值ge HIVE中?我们无法为此写任何东西。

解决方案您可以使用嵌套查询来实现这一点 如下所示:

第一步:通过occurrence_id计算初始价格平均值

  SELECT customer_name,product_name,occuprance_id,avg(price)as avg_of_current_occurance 
FROM customer_info
GROUP BY customer_name,product_name,occurance_id;

第二步:计算第一步返回的avg的平均值



  hive(默认)> 
> SELECT customer_name,product_name,avg(avg_of_current_occurance)as final_avg
> FROM(
> SELECT customer_name,product_name,occuprance_id,avg(price)as avg_of_current_occurance
> FROM customer_info
> GROUP BY customer_name,product_name,occurance_id
>)W
> GROUP BY customer_name,product_name;

总计MapReduce工作= 1
启动作业1满分1

执行成功完成

客户名称product_name final_avg
客户1 product1 22.5
customer1 product2 26.666666666666668


Folks,

We have following data and we need following output.

 CUSTOMER_NAME PRODUCT_NAME PRICE OCCURANCE ID
 customer1,    product1,    20,       1
 customer1,    product2,    30,       2
 customer1,    product1,    25,       3
 customer1,    product1,    20,       1
 customer1,    product2,    20,       2
 customer1,    product2,    30,       2

First we need to average the price by occurance id.

 customer1,product1,20 (AVG is 20 for occurance 1), 1
 customer1,product1,25 (AVG is 25 for occurance 3) , 3

Now once again we have to average it by customername,product name (Occurance is ignored in group by)

Final Output custoemr1,product1,avg price of all occurances.

customer1,product1, 20 + 25/2 = 22.5

Basically how to do average of average in HIVE ? We are not able to write anything for this.

解决方案

Hi this can be achieved using nested queries as follows :

First step : to calculate initial averages of price by occurrence_id

SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
FROM customer_info
GROUP BY customer_name,product_name,occurance_id ;

Second Step : calculate the avg of avgs returned in first step

hive (default)>
              > SELECT customer_name, product_name,avg(avg_of_current_occurance) as final_avg
              > FROM(
              > SELECT customer_name, product_name,occurance_id, avg(price) as avg_of_current_occurance
              > FROM customer_info
              > GROUP BY customer_name,product_name,occurance_id
              > ) W
              > GROUP BY customer_name,product_name;

Total MapReduce jobs = 1
Launching Job 1 out of 1

Execution completed successfully

customer_name   product_name    final_avg
customer1       product1        22.5
customer1       product2        26.666666666666668   

这篇关于按一种组合分组,并再次将其分组为其他项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆