在hadoop中并行运行作业 [英] Running jobs parallely in hadoop

查看:141
本文介绍了在hadoop中并行运行作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是hadoop的新手。



我已经设置了2个节点群集。



如何在hadoop中并行运行2个作业。 p>

当我提交作业时,它们按FIFO顺序依次运行。我必须平行运行这些工作。如何实现这一点。



感谢
MRK

解决方案

Hadoop可以配置多个调度程序,缺省值是FIFO调度程序。



先进先出计划表现如下。方案1:如果集群具有10个地图任务容量和job1需求15 Map Task,然后运行job1获取完整的集群。由于job1取得了进展,并且有可用的空闲槽,job1不使用,所以job2在群集上运行。

job1需要6个Map任务,然后Job1需要6个插槽,Job2需要4个插槽。 job1和job2并行运行。



要从头开始并行运行作业,您可以配置Fair Scheduler 容量调度程序根据您的要求。必须设置mapreduce.jobtracker.taskscheduler和特定的调度程序参数才能在mapred-site.xml



编辑:根据来自MRK的评论。


I am new to hadoop.

I have set up a 2 node cluster.

How to run 2 jobs parallely in hadoop.

When i submit jobs, they are running one by one in FIFO order. I have to run the jobs parallely. How to acheive that.

Thanks MRK

解决方案

Hadoop can be configured with a number of schedulers and the default is the FIFO scheduler.

FIFO Schedule behaves like this.

Scenario 1: If the cluster has 10 Map Task capacity and job1 needs 15 Map Task, then running job1 takes the complete cluster. As job1 makes progress and there are free slots available which are not used by job1 then job2 runs on the cluster.

Scenario 2: If the cluster has 10 Map Task capacity and job1 needs 6 Map Task, then job1 takes 6 slots and job2 takes 4 slots. job1 and job2 run in parallel.

To run jobs in parallel from the start, you can either configure a Fair Scheduler or a Capacity Scheduler based on your requirements. The mapreduce.jobtracker.taskscheduler and the specific scheduler parameters have to be set for this to take effect in the mapred-site.xml.

Edit: Updated the answer based on the comment from MRK.

这篇关于在hadoop中并行运行作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆