HiveQL和rank（） [英] HiveQL and rank()

查看：96 发布时间：2018/5/31 18:47:41 hadoop hive hiveql

本文介绍了HiveQL和rank（）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我无法理解HiveQL的排名（）。我在WWW上发现了一些排名UDF的实现，例如爱德华的好例子。我可以加载和访问函数，但我无法让他们做我想做的事。下面是一个详细的例子：

将UDF加载到CLI过程中：

  $ javac -classpath /home/hadoop/hadoop/hadoop-core-1.0.4.jar:/home/hadoop/hive/lib/hive-exec-0.10.0.jar com / m6d / hiveudf / Rank2。 java 
 $ jar -cvf Rank2.jar com / m6d / hiveudf / Rank2.class 
 hive> ADD JAR /home/hadoop/MyDemo/Rank2.jar; 
 hive> CREATE TEMPORARY FUNCTION Rank2 AS'com.m6d.hiveudf.Rank2';

创建表：

 创建表格购买（
 SalesRepId字符串，
 PurchaseOrderId INT，
金额INT 
）
行格式限定
字段终止'，'
'\\\
'终止的线路;

从CSV中加载数据：

  Jana，1,100 
 Nadia，2,200 
 Nadia，3,600 
 Daniel，4,80 
 Jana，5,120 
 William，6,170 
 Daniel，7,140

通过CLI提供这个功能：

  LOAD DATA 
 LOCAL INPATH'/home/hadoop/MyDemo/purchases.csv'
 INTO TABLE购买;

现在我可以看到我的最高销售代表：

 选择SalesRepId，将金额（金额）作为批量
从采购
组中通过SalesRepId 
 ORDER BY批量DESC;

Nadia售出了800美元的东西，Daniel和Jana都卖出了220美元，而William卖出了170美元

  SalesRep金额
 -------- ------ 
 Nadia 800 
 Daniel 220 
 Jana 220 
 William 170

现在我想要要做的是数字他们：纳迪亚是第一，丹尼尔和贾纳并列第二，威廉是第四（不＃3）

 <$从
中选择SalesRepId，V.volume，rank2（V.volume）
（选择SalesRepId，总额（金额）作为批量
从购买
组中通过SalesRepId 
 ORDER BY volume DESC）V;

这是我得到的，但不是我想要的：

  SalesRep金额等级
 -------- ------ ---- 
 Nadia 800 1 
 Daniel 220 1 
 Jana 220 2 
 William 170 1

这就是我想，但我不能让配置单元为我做：

  SalesRep金额等级
 ---- ---- ------ ---- 
 Nadia 800 1 
 Daniel 220 2 
 Jana 220 2 
 William 170 4 
  
 你能帮我用正确的HiveQL来排列我的销售代表吗？
 
 
 感谢JtheRocker的回应。他的变化导致了这个列表： 
 
 
  SalesRep金额等级
 -------- ---- -  ---- 
 William 170 1 
 Daniel 220 2 
 Jana 220 2 
 Nadia 800 3 
  
稍微修改显示Nadia为4（不是第3个）： 
 
 
 私人row_number; 
 @Override 
 public Object evaluate（DeferredObject [] currentKey）throws HiveException {
 row_number ++; 
 if（！sameAsPreviousKey（currentKey））{
 this.counter = row_number; 
 copyToPreviousKey（currentKey）; 
} 
 return new Long（this.counter）; 
 
  
 
 
解决方案
在Hive 0.11中引入的窗口化和分析功能，您可以使用：
 $ b  
 
 pre $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $金额）作为来自采购组的销售量由SalesRepId）V;

I can't understand HiveQL rank(). I've found a couple of implementations of rank UDF's on the WWW, such as Edward's nice example. I can load and access the functions, but I can't get them to do what I want. Here is a detailed example:

Loading the UDF into the CLI process:
$ javac -classpath /home/hadoop/hadoop/hadoop-core-1.0.4.jar:/home/hadoop/hive/lib/hive-exec-0.10.0.jar com/m6d/hiveudf/Rank2.java $ jar -cvf Rank2.jar com/m6d/hiveudf/Rank2.class hive> ADD JAR /home/hadoop/MyDemo/Rank2.jar; hive> CREATE TEMPORARY FUNCTION Rank2 AS 'com.m6d.hiveudf.Rank2';
Create a table:
create table purchases ( SalesRepId String, PurchaseOrderId INT, Amount INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
Load data from this CSV:
Jana,1,100 Nadia,2,200 Nadia,3,600 Daniel,4,80 Jana,5,120 William,6,170 Daniel,7,140
With this from the CLI:
LOAD DATA LOCAL INPATH '/home/hadoop/MyDemo/purchases.csv' INTO TABLE purchases;
Now I can see my top Sales Reps:
select SalesRepId,sum(amount) as volume from purchases group by SalesRepId ORDER BY volume DESC;
Nadia has sold $800 of stuff, Daniel and Jana have both sold $220, and William has sold $170
SalesRep Amount -------- ------ Nadia 800 Daniel 220 Jana 220 William 170
Now all I want to do is number them: Nadia is #1, Daniel and Jana are tied for #2, and William is #4 (not #3)
select SalesRepId, V.volume,rank2(V.volume) from (select SalesRepId,sum(amount) as volume from purchases group by SalesRepId ORDER BY volume DESC) V;
This is what I get, but NOT what I want:
SalesRep Amount Rank -------- ------ ---- Nadia 800 1 Daniel 220 1 Jana 220 2 William 170 1
This is what I WANT, but I can't make hive do it for me:
SalesRep Amount Rank -------- ------ ---- Nadia 800 1 Daniel 220 2 Jana 220 2 William 170 4
Can you help me with the correct HiveQL to rank my Sales Reps?

Thanks to JtheRocker for his response. His change resulted in this list:
SalesRep Amount Rank -------- ------ ---- William 170 1 Daniel 220 2 Jana 220 2 Nadia 800 3
A slight modification to show Nadia as 4th (not 3rd):
private row_number; @Override public Object evaluate(DeferredObject[] currentKey) throws HiveException { row_number++; if (!sameAsPreviousKey(currentKey)) { this.counter = row_number; copyToPreviousKey(currentKey); } return new Long(this.counter); }

解决方案
With the Windowing and Analytics functions introduced in Hive 0.11, you can use:
select SalesRepId, volume as amount , rank() over (order by V.volume desc) as rank from (select SalesRepId,sum(amount) as volume from purchases group by SalesRepId) V;

这篇关于HiveQL和rank（）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

HiveQL和rank（） [英] HiveQL and rank()

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

HiveQL和rank（） [英] HiveQL and rank()

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭