hadoop蜂巢计数并发 [英] hadoop hive count concurrency

查看：188 发布时间：2018/6/12 14:14:05 java hadoop hive

本文介绍了hadoop蜂巢计数并发的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何在hadoop中实现它？

在配置单元中，我有一个有很多列的表，其中两个是begin_time，end_time。 b
$ b

我需要统计每次的数量

一张表是这样的：

  begin_time end_time 
 2011.04.26 10：19：06 ^ A2011.04.26 10:20:22 
 2011.04.26 10:19 ：08 ^ A2011.04.26 10:21:49 
 2011.04.26 10：19：08 ^ A2011.04.26 11:18:46 
 2011.04.26 10：19：09 ^ A2011.04.26 12： 08:36 
 2011.04.26 10：19：09 ^ A2011.04.26 11:00:16 
 2011.04.26 10：19：11 ^ A2011.04.26 10:19:17 
 2011.04 .26 10：19：12 ^ A2011.04.26 10:46:21 
 2011.04.26 10：19：13 ^ A2011.04.26 10:55:43 
 2011.04.26 10：19：17 ^ A2011.04.26 10:19:41 
 2011.04.26 10：19：18 ^ A2011.04.26 10:34:41

我想要的结果是在特定时间有多少人。

例如在2011.04.26 10:19:08，有3名游客当然有一个在19:06和2在19:08。

和2011.04.26 10： 19:18是9，当然是10，但是2011.04.26 10:19:17请假1

想要的结果是piece

  2011.04.26 10:19:06 1 
 2011.04.26 10:19:08 3 
 2011.04.26 10:19:09 5 
 2011.04.26 10:19:11 6 
 2011.04.26 10:19:12 7 
 2011.04.26 10:19:13 8 
 2011.04.26 10:19： 17 9 
 2011.04.26 10:19:18 9

非常感谢任何帮助，

解决方案

您可以在配置单元上尝试此操作（假设表名为test_log）：

  select / * + MAPJOIN（driven）* / driven.time，count（*）
 from 
（select time 
从
开始（从test_log union中选择begin_time时间all 
从test_log中选择end_time时间）
按时间分组）
 join test_log l 
其中
驾驶ime between l.begin_time和l.end_time 
 by driven.time

可能不是最好的解决方案，但至少它的作品。
您可以在驱动子查询上添加一些过滤器来减少数据集。

How to implement it in hadoop?

In hive, I have a table with lots columns, which two of them are begin_time, end_time.

I need to count the number on the each time

a piece of the table is this:
begin_time end_time 2011.04.26 10:19:06^A2011.04.26 10:20:22 2011.04.26 10:19:08^A2011.04.26 10:21:49 2011.04.26 10:19:08^A2011.04.26 11:18:46 2011.04.26 10:19:09^A2011.04.26 12:08:36 2011.04.26 10:19:09^A2011.04.26 11:00:16 2011.04.26 10:19:11^A2011.04.26 10:19:17 2011.04.26 10:19:12^A2011.04.26 10:46:21 2011.04.26 10:19:13^A2011.04.26 10:55:43 2011.04.26 10:19:17^A2011.04.26 10:19:41 2011.04.26 10:19:18^A2011.04.26 10:34:41
the result I want is how many people is in on a specific time.

e.g. on 2011.04.26 10:19:08, there 3 visitor on course there one in 19:06, and 2 in 19:08.

and 2011.04.26 10:19:18 is 9, course ten but one leave on 2011.04.26 10:19:17

the desired result for piece is
2011.04.26 10:19:06 1 2011.04.26 10:19:08 3 2011.04.26 10:19:09 5 2011.04.26 10:19:11 6 2011.04.26 10:19:12 7 2011.04.26 10:19:13 8 2011.04.26 10:19:17 9 2011.04.26 10:19:18 9
Any help is much appreciated and welcome.
解决方案
You can try this on hive (assume the table name is test_log):
select /*+ MAPJOIN(driven) */ driven.time, count(*) from (select time from (select begin_time time from test_log union all select end_time time from test_log) u group by time) driven join test_log l on true where driven.time between l.begin_time and l.end_time group by driven.time
Probably not the best solution but at least it works. You can add some filter on the driven subquery to reduce the data set.

这篇关于hadoop蜂巢计数并发的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

hadoop蜂巢计数并发 [英] hadoop hive count concurrency

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

hadoop蜂巢计数并发 [英] hadoop hive count concurrency

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭