如何让 hive 同时运行 mapreduce 作业? [英] How to make hive run mapreduce jobs concurrently?

查看：18 发布时间：2021/12/28 23:39:33 hadoop mapreduce hive

本文介绍了如何让 hive 同时运行 mapreduce 作业?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 hive 的新手，遇到了一个问题，

I'm new to hive and I have encountered a problem,

我在蜂巢中有一张这样的桌子:

I have a table in hive like this:

create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int,
v4 int, v5 bigint, v6 int)  PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ','  lines TERMINATED BY '
' ;

我运行一个 sql 如下:

And I run an sql like:

from td
INSERT OVERWRITE  DIRECTORY '/tmp/total.out' select count(v1)
INSERT OVERWRITE  DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1)
INSERT OVERWRITE  DIRECTORY '/tmp/distinctuin.out' select distinct v1

INSERT OVERWRITE  DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4
INSERT OVERWRITE  DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4

INSERT OVERWRITE  DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1)  where v4=2 or v4=6
INSERT OVERWRITE  DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3

INSERT OVERWRITE  DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1)  where v4=1 or v4=5
INSERT OVERWRITE  DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3

它有效，输出结果就是我想要的.

it works, and the output result is what I want.

但是有一个问题，hive 生成 9 个 mapreduce 作业，并一一运行这些作业.

but there is one problem, hive generate 9 mapreduce jobs and run these jobs one by one.

我对这个查询运行了解释，我得到以下消息:

I run explain on this query, and I got the following message:

STAGE DEPENDENCIES:
  Stage-9 is a root stage
  Stage-0 depends on stages: Stage-9
  Stage-10 depends on stages: Stage-9
  Stage-1 depends on stages: Stage-10
  Stage-11 depends on stages: Stage-9
  Stage-2 depends on stages: Stage-11
  Stage-12 depends on stages: Stage-9
  Stage-3 depends on stages: Stage-12
  Stage-13 depends on stages: Stage-9
  Stage-4 depends on stages: Stage-13
  Stage-14 depends on stages: Stage-9
  Stage-5 depends on stages: Stage-14
  Stage-15 depends on stages: Stage-9
  Stage-6 depends on stages: Stage-15
  Stage-16 depends on stages: Stage-9
  Stage-7 depends on stages: Stage-16
  Stage-17 depends on stages: Stage-9
  Stage-8 depends on stages: Stage-17

似乎stage 9-17对应的是mapreduce job 0-8
但是从上面的解释信息来看，阶段 10-17 只取决于阶段 9，
所以我有一个问题，为什么作业 1-8 不能同时运行?

it seems that stage 9-17 is corresponding to mapreduce job 0-8
but from the explain message above, stage 10-17 only depends on stage 9,
so I have an question, why job 1-8 can't run concurrently?

或者如何让作业 1-8 同时运行?

Or how can I make job 1-8 run concurrently?

非常感谢您的帮助！

如何让 hive 同时运行 mapreduce 作业? [英] How to make hive run mapreduce jobs concurrently?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何让 hive 同时运行 mapreduce 作业? [英] How to make hive run mapreduce jobs concurrently?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭