无法在sparksql中选择每组前10条记录 [英] unable to select top 10 records per group in sparksql

查看:56
本文介绍了无法在sparksql中选择每组前10条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Spark sql 的新手.我有一个这样的数据框.

Hi I am new to spark sql. I have a data frame like this.

  ---+----------+----+----+----+------------------------+
 |tag id|timestamp|listner| orgid |org2id|RSSI
 +---+----------+----+----+----+------------------------+
 |  4|1496745912| 362|   4|   3|                    0.60|
 |  4|1496745924|1901|   4|   3|                    0.60|
 |  4|1496746030|1901|   4|   3|                    0.60|
 |  4|1496746110| 718|   4|   3|                    0.30|
 |  2|1496746128| 718|   4|   3|                    0.60|
 |  2|1496746188|1901|   4|   3|                    0.10|

我想在 spark sql 中为每个监听器选择前 10 个时间戳值.

I want to select for each listner top 10 timestamp values in spark sql.

我尝试了以下查询.它抛出错误.

I tried the following query.It throws errors.

  val avg = sqlContext.sql("select top 10 * from avg_table") // throws error.

  val avg = sqlContext.sql("select rssi,timestamp,tagid from avg_table order by desc limit 10")  // it prints only 10 records.

我想为每个需要取前 10 个时间戳值的监听器选择.任何帮助将不胜感激.

I want to select for each listner I need to take top 10 timestamp values. Any help will be appreciated.

推荐答案

这不行吗?

select rssi, timestamp, tagid
from avg_table
order by timestamp desc
limit 10;

哦,我明白了.你想要row_number():

Oh, I get it. You want row_number():

select rssi, timestamp, tagid
from (select a.*,
             row_number() over (partition by listner order by timestamp desc) as seqnum
      from avg_table
     ) a
where seqnum <= 10
order by a.timestamp desc;

这篇关于无法在sparksql中选择每组前10条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆