Hive QL - 每个项目的行数限制 [英] Hive QL - Limiting number of rows per each item

查看:18
本文介绍了Hive QL - 每个项目的行数限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在 where 子句中列出了多个项目,如何将列表中每个项目的结果限制为 N?

If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?

例如:

select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000

推荐答案

听起来您的问题是要获得每个 a_id 的前 N ​​个.您可以使用 Hive 11 中引入的窗口函数来完成此操作.例如:

Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:

SELECT a_id, b, c, count(*) as sumrequests
FROM (
    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
    FROM table_name
    ) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;

这将为每个 a_id 输出多达 10,000 个随机选择的行.如果您希望按不仅仅是 a_id 进行分组,则可以进一步对其进行分区.您还可以在窗口函数中使用 order by,有很多示例可以显示其他选项.

This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.

这篇关于Hive QL - 每个项目的行数限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆