为昨天的日期查找每个BUYER_ID的TOP 10最新记录 [英] Find TOP 10 latest record for each BUYER_ID for yesterday's date
问题描述
这是下表
CREATE TABLE IF NOT EXISTS TestingTable1
(
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING
)
这是上表中的数据 -
BUYER_ID | ITEM_ID | CREATED_TIME
------------ + ------------------ + ------------- ----------
1015826235 220003038067 2012-07-09 19:40:21,
1015826235 300003861266 2012-07-09 18:19:59,
1015826235 140002997245 2012-07-09 09:23:17,
1015826235 210002448035 2012-07-09 22:21:11,
1015826235 260003553381 2012-07-09 07:09:56,
1015826235 260003553382 2012-07-09 19:40:39,
1015826235 260003553383 2012-07-09 06:58:47,
1015826235 260003553384 2012-07-09 07:28:47,
1015826235 260003553385 2012-07-09 08:48:47,
1015826235 260003553386 2012-07-09 06:38:47,
1015826235 260003553387 2012-07-09 05:38:47,
1015826235 260003553388 2012-07-09 04:55:47,
1015826235 260003553389 2012-07-09 06:54:37,
34512201 597245693 2012-07-09 16:20:21,
34512201 8071787728 2012-07-09 15:19: 59,
34512201 5868222883 2012-07-09 08:23:17,
34512201 2412180494 2012-07-09 22:21:11,
34512201 2422054205 2012-07-09 06:09 :56,
34512201 1875744030 2012-07-09 19:40:39,
34512201 5639158173 2012-07-09 06:58:47,
34512201 5656232360 2012-07-09 07: 28:47,
34512201 959188449 2012-07-09 08:48:47,
34512201 4645350592 2012-07-09 06:38:47,
34512201 5657320532 2012-07-09 05 :38:47,
34512201 290419656539 2012-07-09 04:55:47,
如果您在表格中看到上述数据,则只有两个UNIQUE BUYER_ID
,并对应于我拥有 ITEM_ID
AND CREATED_TIME
。我只需要在今天的日期前一天的时间里有10个最新记录的基础,每当我为每个 BUYER_ID
发射这个查询(意味着昨天的日期)。
因此,对于这个 BUYER_ID
- 34512201
code> BUYER_ID 基于 CREATED_TIME
仅限昨天的日期。
以及每个 BUYER_ID
都可以有任何一天的数据。但是我特别感兴趣的是前一天的数据(通常是指昨天的日期),通过检查 CREATED_TIME
在昨天的日期,为每个 BUYER_ID
查找 TOP 10
的最新数据。以下是我应该对应每个 BUYER_ID
。
示例输出的示例输出。
BUYER_ID | ITEM_ID | CREATED_TIME
------------ + ------------------ + ------------- ----------
34512201 2412180494 2012-07-09 22:21:11
34512201 1875744030 2012-07-09 19:40:39
34512201 597245693 2012- 07-09 16:20:21
34512201 8071787728 2012-07-09 15:19:59
34512201 959188449 2012-07-09 08:48:47
34512201 5868222883 2012-07- 09 08:23:17
34512201 5656232360 2012-07-09 07:28:47
34512201 5639158173 2012-07-09 06:58:47
34512201 4645350592 2012-07-09 06 :38:47
34512201 2422054205 2012-07-09 06:09:56
1015826235 210002448035 2012-07-09 22:21:11
1015826235 260003553382 2012-07-09 19:40 :39
1015826235 220003038067 2012-07-09 19:40:21
1015826235 300003861266 2012-07-09 18:19:59
1015826235 140002997245 2012-07-09 09:23:17
10158 26235 260003553385 2012-07-09 08:48:47
1015826235 260003553384 2012-07-09 07:28:47
1015826235 260003553381 2012-07-09 07:09:56
1015826235 260003553383 2012-07-09 06:58:47
1015826235 260003553389 2012-07-09 06:54:37
我正在使用 Hive
和 Hive
支持SQL语法。所以我需要确保SQL也应该在Hive环境中工作。
任何人都可以帮我解决这个问题吗?
更新: -
我正在使用下面的查询,我需要从下面的查询中获得前10个最新的内容,并且需要添加一个更多限制日期检查,意味着昨天日期的where子句
- 我不能在这里使用 TOP 10
,因为Hive doesn' t支持 TOP 10
sql语法。我需要一些其他方式来解决这个问题。
SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id,created_time DESC;
更多资讯: -
我使用 RANK UserDefinedFunction
编写了下面的查询。
SELECT buyer_id,item_id,created_time,rk
FROM(
SELECT buyer_id,item_id,rank(item_id)as rk,created_time $ b $ FROM FROM testtable1
DISTRIBUTE BY buyer_id, item_id
SORT BY buyer_id,item_id,created_time desc
)a
WHERE rk< 10
ORDER BY buyer_id,created_time,rk;
这是Java中的RANK UDF函数 -
package com.example.hive.udf;
导入org.apache.hadoop.hive.ql.exec.UDF;
public final class Rank扩展UDF {
private int counter;
私人字符串last_key;
public int evaluate(final String key){
if(!key.equalsIgnoreCase(this.last_key)){
this.counter = 0;
this.last_key = key;
}
返回this.counter ++;
}
}
以上查询不是我想要的方式,我猜在这个查询中有一些扭曲。
有没有办法在HiveQL中不使用任何UDF?
SELECT FIRST 10 *
FROM TestingTable1
WHERE buyer_id = 34512201
ORDER BY created_time DESC;
This is the below table
CREATE TABLE IF NOT EXISTS TestingTable1
(
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING
)
And this is the below data in the above table-
BUYER_ID | ITEM_ID | CREATED_TIME
------------+------------------+-----------------------
1015826235 220003038067 2012-07-09 19:40:21,
1015826235 300003861266 2012-07-09 18:19:59,
1015826235 140002997245 2012-07-09 09:23:17,
1015826235 210002448035 2012-07-09 22:21:11,
1015826235 260003553381 2012-07-09 07:09:56,
1015826235 260003553382 2012-07-09 19:40:39,
1015826235 260003553383 2012-07-09 06:58:47,
1015826235 260003553384 2012-07-09 07:28:47,
1015826235 260003553385 2012-07-09 08:48:47,
1015826235 260003553386 2012-07-09 06:38:47,
1015826235 260003553387 2012-07-09 05:38:47,
1015826235 260003553388 2012-07-09 04:55:47,
1015826235 260003553389 2012-07-09 06:54:37,
34512201 597245693 2012-07-09 16:20:21,
34512201 8071787728 2012-07-09 15:19:59,
34512201 5868222883 2012-07-09 08:23:17,
34512201 2412180494 2012-07-09 22:21:11,
34512201 2422054205 2012-07-09 06:09:56,
34512201 1875744030 2012-07-09 19:40:39,
34512201 5639158173 2012-07-09 06:58:47,
34512201 5656232360 2012-07-09 07:28:47,
34512201 959188449 2012-07-09 08:48:47,
34512201 4645350592 2012-07-09 06:38:47,
34512201 5657320532 2012-07-09 05:38:47,
34512201 290419656539 2012-07-09 04:55:47,
If you see the above data in the table, there are only two UNIQUE BUYER_ID
and corresponding to those I have ITEM_ID
AND CREATED_TIME
. I need only 10 latest record basis on the time for the day before today's date whenever I will be firing this query (meaning yesterday's date) for each BUYER_ID
.
So for this BUYER_ID
- 34512201
I need 10 latest record for each BUYER_ID
basis on CREATED_TIME
for yesterday's date only.
And each BUYER_ID
can have any day's data. But I am specifically interested for day before today's data(means yesterday's date always) by checking at the CREATED_TIME
Find TOP 10
latest data for each BUYER_ID
for yesterday's date. Below is the sample output I should be getting corresponding to each BUYER_ID
.
Sample Output.
BUYER_ID | ITEM_ID | CREATED_TIME
------------+------------------+-----------------------
34512201 2412180494 2012-07-09 22:21:11
34512201 1875744030 2012-07-09 19:40:39
34512201 597245693 2012-07-09 16:20:21
34512201 8071787728 2012-07-09 15:19:59
34512201 959188449 2012-07-09 08:48:47
34512201 5868222883 2012-07-09 08:23:17
34512201 5656232360 2012-07-09 07:28:47
34512201 5639158173 2012-07-09 06:58:47
34512201 4645350592 2012-07-09 06:38:47
34512201 2422054205 2012-07-09 06:09:56
1015826235 210002448035 2012-07-09 22:21:11
1015826235 260003553382 2012-07-09 19:40:39
1015826235 220003038067 2012-07-09 19:40:21
1015826235 300003861266 2012-07-09 18:19:59
1015826235 140002997245 2012-07-09 09:23:17
1015826235 260003553385 2012-07-09 08:48:47
1015826235 260003553384 2012-07-09 07:28:47
1015826235 260003553381 2012-07-09 07:09:56
1015826235 260003553383 2012-07-09 06:58:47
1015826235 260003553389 2012-07-09 06:54:37
I am working with Hive
and Hive
supports SQL like syntax. So I need to make sure the SQL should work in Hive environment too.
Can anyone help me with this?
Update:-
I am using the below query and I need to get top 10 latest from the below query and need to add one more qualifier for date check, means in where clause for yesterday's date
- I cannot use TOP 10
here as Hive doesn't support TOP 10
sql syntax. I need some other way to do this problem.
SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC;
One More UPDATE:-
I wrote this below query with the use of RANK UserDefinedFunction
.
SELECT buyer_id, item_id, created_time, rk
FROM (
SELECT buyer_id, item_id, rank(item_id) as rk, created_time
FROM testingtable1
DISTRIBUTE BY buyer_id, item_id
SORT BY buyer_id, item_id, created_time desc
) a
WHERE rk < 10
ORDER BY buyer_id, created_time, rk;
And this is the RANK UDF functions in Java-
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
public final class Rank extends UDF{
private int counter;
private String last_key;
public int evaluate(final String key){
if ( !key.equalsIgnoreCase(this.last_key) ) {
this.counter = 0;
this.last_key = key;
}
return this.counter++;
}
}
And above query is not work the way I wanted to, some sort of twist has to be made I guess in that query.
Is there any way to do this without using any UDF in HiveQL?
SELECT FIRST 10 *
FROM TestingTable1
WHERE buyer_id = 34512201
ORDER BY created_time DESC;
这篇关于为昨天的日期查找每个BUYER_ID的TOP 10最新记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!