无法在sparksql中选择每组前10条记录 [英] unable to select top 10 records per group in sparksql
本文介绍了无法在sparksql中选择每组前10条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是 Spark sql 的新手.我有一个这样的数据框.
Hi I am new to spark sql. I have a data frame like this.
---+----------+----+----+----+------------------------+
|tag id|timestamp|listner| orgid |org2id|RSSI
+---+----------+----+----+----+------------------------+
| 4|1496745912| 362| 4| 3| 0.60|
| 4|1496745924|1901| 4| 3| 0.60|
| 4|1496746030|1901| 4| 3| 0.60|
| 4|1496746110| 718| 4| 3| 0.30|
| 2|1496746128| 718| 4| 3| 0.60|
| 2|1496746188|1901| 4| 3| 0.10|
我想在 spark sql 中为每个监听器选择前 10 个时间戳值.
I want to select for each listner top 10 timestamp values in spark sql.
我尝试了以下查询.它抛出错误.
I tried the following query.It throws errors.
val avg = sqlContext.sql("select top 10 * from avg_table") // throws error.
val avg = sqlContext.sql("select rssi,timestamp,tagid from avg_table order by desc limit 10") // it prints only 10 records.
我想为每个需要取前 10 个时间戳值的监听器选择.任何帮助将不胜感激.
I want to select for each listner I need to take top 10 timestamp values. Any help will be appreciated.
推荐答案
这不行吗?
select rssi, timestamp, tagid
from avg_table
order by timestamp desc
limit 10;
哦,我明白了.你想要row_number()
:
Oh, I get it. You want row_number()
:
select rssi, timestamp, tagid
from (select a.*,
row_number() over (partition by listner order by timestamp desc) as seqnum
from avg_table
) a
where seqnum <= 10
order by a.timestamp desc;
这篇关于无法在sparksql中选择每组前10条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文