Hive QL - 限制每个项目的行数 [英] Hive QL - Limiting number of rows per each item

查看：1223 发布时间：2018/5/31 18:48:38 hadoop hql hive hiveql

本文介绍了Hive QL - 限制每个项目的行数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我在一个where子句中列出了多个项目，那么如何将结果限制为N，以便列表中的每个项目？

EX：

  select a_id，b，c，count（*），as sumrequests 
 from table_name 
 where 
 a_id in（1,2,3）
 group by a_id，b，c 
 limit 10000

解决方案

听起来像你的问题是得到每个a_id的前N个。您可以使用Hive 11中引入的窗口函数来执行此操作。例如：

  SELECT a_id，b，c，count（ *）as sumrequests 
 FROM（
 SELECT a_id，b，c，row_number（）over（Partition BY a_id）as row 
 FROM table_name 
）rs 
 WHERE row （1，2，3）
 GROUP BY a_id，b，c;

每个a_id会输出多达10,000个随机选择的行。如果您希望通过不仅仅是a_id进行分组，您可以进一步对其进行分区。您也可以在窗口函数中使用顺序，这里有很多示例来显示其他选项。

If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?

EX:
select a_id,b,c, count(*), as sumrequests from table_name where a_id in (1,2,3) group by a_id,b,c limit 10000

解决方案
Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:
SELECT a_id, b, c, count(*) as sumrequests FROM ( SELECT a_id, b, c, row_number() over (Partition BY a_id) as row FROM table_name ) rs WHERE row <= 10000 AND a_id in (1, 2, 3) GROUP BY a_id, b, c;
This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.

这篇关于Hive QL - 限制每个项目的行数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive QL - 限制每个项目的行数 [英] Hive QL - Limiting number of rows per each item

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hive QL - 限制每个项目的行数 [英] Hive QL - Limiting number of rows per each item

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭