为每个 BUYER_ID 查找昨天日期的 TOP 10 最新记录 [英] Find TOP 10 latest record for each BUYER_ID for yesterday's date

查看:26
本文介绍了为每个 BUYER_ID 查找昨天日期的 TOP 10 最新记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是下表

CREATE TABLE IF NOT EXISTS TestingTable1 
( 
BUYER_ID BIGINT,
ITEM_ID BIGINT, 
CREATED_TIME STRING
)

这是上表中的以下数据-

And this is the below data in the above table-

BUYER_ID    |    ITEM_ID       |    CREATED_TIME
------------+------------------+-----------------------
1015826235      220003038067        2012-07-09 19:40:21,
1015826235      300003861266        2012-07-09 18:19:59,
1015826235      140002997245        2012-07-09 09:23:17,
1015826235      210002448035        2012-07-09 22:21:11,
1015826235      260003553381        2012-07-09 07:09:56,
1015826235      260003553382        2012-07-09 19:40:39,
1015826235      260003553383        2012-07-09 06:58:47,
1015826235      260003553384        2012-07-09 07:28:47,
1015826235      260003553385        2012-07-09 08:48:47,
1015826235      260003553386        2012-07-09 06:38:47,
1015826235      260003553387        2012-07-09 05:38:47,
1015826235      260003553388        2012-07-09 04:55:47,
1015826235      260003553389        2012-07-09 06:54:37,
34512201        597245693           2012-07-09 16:20:21,
34512201        8071787728          2012-07-09 15:19:59,
34512201        5868222883          2012-07-09 08:23:17,
34512201        2412180494          2012-07-09 22:21:11,
34512201        2422054205          2012-07-09 06:09:56,
34512201        1875744030          2012-07-09 19:40:39,
34512201        5639158173          2012-07-09 06:58:47,
34512201        5656232360          2012-07-09 07:28:47, 
34512201        959188449           2012-07-09 08:48:47,
34512201        4645350592          2012-07-09 06:38:47,
34512201        5657320532          2012-07-09 05:38:47,
34512201        290419656539        2012-07-09 04:55:47,

如果你在表格中看到上面的数据,只有两个UNIQUE BUYER_ID 和我有ITEM_ID AND CREATED_TIME 的对应.每当我为每个 BUYER_ID 触发此查询(意思是昨天的日期)时,我只需要基于今天日期前一天的时间的 10 个最新记录.

If you see the above data in the table, there are only two UNIQUE BUYER_ID and corresponding to those I have ITEM_ID AND CREATED_TIME. I need only 10 latest record basis on the time for the day before today's date whenever I will be firing this query (meaning yesterday's date) for each BUYER_ID.

所以对于这个 BUYER_ID - 34512201 我需要每个 BUYER_ID 的 10 个最新记录,基于昨天的 CREATED_TIME仅限日期.

So for this BUYER_ID - 34512201 I need 10 latest record for each BUYER_ID basis on CREATED_TIME for yesterday's date only.

并且每个 BUYER_ID 可以有任何一天的数据.但是我通过检查 CREATED_TIME

And each BUYER_ID can have any day's data. But I am specifically interested for day before today's data(means yesterday's date always) by checking at the CREATED_TIME

为每个 BUYER_ID 查找昨天日期的 TOP 10 最新数据.下面是我应该得到的与每个 BUYER_ID 相对应的示例输出.

Find TOP 10 latest data for each BUYER_ID for yesterday's date. Below is the sample output I should be getting corresponding to each BUYER_ID.

样本输出.

BUYER_ID    |    ITEM_ID       |    CREATED_TIME
------------+------------------+-----------------------
34512201        2412180494          2012-07-09 22:21:11
34512201        1875744030          2012-07-09 19:40:39
34512201        597245693           2012-07-09 16:20:21
34512201        8071787728          2012-07-09 15:19:59
34512201        959188449           2012-07-09 08:48:47
34512201        5868222883          2012-07-09 08:23:17
34512201        5656232360          2012-07-09 07:28:47 
34512201        5639158173          2012-07-09 06:58:47
34512201        4645350592          2012-07-09 06:38:47
34512201        2422054205          2012-07-09 06:09:56
1015826235      210002448035        2012-07-09 22:21:11
1015826235      260003553382        2012-07-09 19:40:39
1015826235      220003038067        2012-07-09 19:40:21
1015826235      300003861266        2012-07-09 18:19:59
1015826235      140002997245        2012-07-09 09:23:17
1015826235      260003553385        2012-07-09 08:48:47
1015826235      260003553384        2012-07-09 07:28:47
1015826235      260003553381        2012-07-09 07:09:56
1015826235      260003553383        2012-07-09 06:58:47
1015826235      260003553389        2012-07-09 06:54:37

我正在使用 Hive 并且 Hive 支持类似 SQL 的语法.所以我需要确保 SQL 也应该在 Hive 环境中工作.

I am working with Hive and Hive supports SQL like syntax. So I need to make sure the SQL should work in Hive environment too.

谁能帮我解决这个问题?

Can anyone help me with this?

更新:-

我正在使用下面的查询,我需要从下面的查询中获取前 10 个最新的,并且需要添加一个更多的限定符用于日期检查,这意味着在 where 子句中用于昨天的日期 - 我不能使用TOP 10 在这里,因为 Hive 不支持 TOP 10 sql 语法.我需要一些其他方法来解决这个问题.

I am using the below query and I need to get top 10 latest from the below query and need to add one more qualifier for date check, means in where clause for yesterday's date- I cannot use TOP 10 here as Hive doesn't support TOP 10 sql syntax. I need some other way to do this problem.

SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC;

另外一个更新:-

我使用 RANK UserDefinedFunction 编写了以下查询.

I wrote this below query with the use of RANK UserDefinedFunction.

SELECT buyer_id, item_id, created_time, rk
FROM (
    SELECT buyer_id, item_id, rank(item_id) as rk, created_time
    FROM testingtable1
    DISTRIBUTE BY buyer_id, item_id
    SORT BY buyer_id, item_id, created_time desc
) a 
WHERE rk < 10
ORDER BY buyer_id, created_time, rk;

这是Java中的RANK UDF函数-

And this is the RANK UDF functions in Java-

package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;

    public final class Rank extends UDF{
        private int  counter;
        private String last_key;
        public int evaluate(final String key){
          if ( !key.equalsIgnoreCase(this.last_key) ) {
             this.counter = 0;
             this.last_key = key;
          }
          return this.counter++;
        }
    }

上面的查询不是我想要的方式,我猜在那个查询中必须进行某种扭曲.

And above query is not work the way I wanted to, some sort of twist has to be made I guess in that query.

有没有办法在 HiveQL 中不使用任何 UDF 来做到这一点?

推荐答案

  SELECT FIRST 10 *
    FROM TestingTable1
   WHERE buyer_id = 34512201
ORDER BY created_time DESC;

这篇关于为每个 BUYER_ID 查找昨天日期的 TOP 10 最新记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆