如何在单个JVM中运行hadoop多线程方式? [英] How to run hadoop multithread way in single JVM?

查看:166
本文介绍了如何在单个JVM中运行hadoop多线程方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有4个核心桌面,并且希望使用我所有的核心与hadoop进行本地数据处理。
(即有时我有足够的能力在本地处理数据,有时我会将相同的作业提交给群集)。
$ b 默认情况下,hadoop本地模式只运行一个映射器和一个减速器,所以我的本地工作真的很慢。
我不想首先在单机上设置集群,因为痛苦的配置,其次我必须每次创建jar。因此,完美的解决方案是如何在单个机器上运行嵌入式Hadoop。



PS伪分布式模式是不好的选择,因为它将创建具有单个节点的群集,所以我将只获得一个映射器,我不得不花费一些时间进行额外的配置。

/hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultithreadedMapRunner.htmlrel =nofollow> MultithreadedMapRunner - 只需在JobConf的setMapRunnerClass方法中设置它, t忘记将mapred.map.multithreadedrunner.threads设置为期望的并发级别。

另外还有一种方法,您应该:


  • 设置 MultithreadedMapper 作为Job类型对象中的映射器类
  • 调用 MultithreadedMapper.setMapperClass 和你在一起实际的映射类

  • 调用 MultithreadedMapper.setNumberOfThreads 具有理想的并发级别


    但是要小心,您的映射器类应该是线程安全的,它的设置和清理方法会被多次调用,所以将MultithreadedMapper与MultipulOutput混合并不是一个聪明的主意,除非您实现了您自己的MultithreadedMapper灵感类。


    I have 4 core desktop and want to use all my cores for local data processing with hadoop. (i.e. sometimes I have enough power to process data locally sometimes I submit same jobs to cluster).

    By default hadoop local mode runs only one mapper and one reducer so my local jobs are really slow. I do not want to setup cluster on single machine first because of "painful" configuration and second I have to create jar each time. So perfect solution is to how run embedded Hadoop on a single machine

    PS pseudo-distributed mode is bad option since it will create cluster with Single node, so I will get only one mapper and I have to spend some time on additional configuration.

    解决方案

    You need to use MultithreadedMapRunner - just set up it in JobConf's setMapRunnerClass method and don't forget to set mapred.map.multithreadedrunner.threads to desirable concurrency level.

    Also there is an another way, you should:

    • set MultithreadedMapper as your mapper class in Job-typed object
    • call MultithreadedMapper.setMapperClass with you actual mapper class
    • call MultithreadedMapper.setNumberOfThreads with desirable concurrency level

    But be careful, your mapper class should be thread safe and it's setup and cleanup methods would be called several times, so it isn't a smart idea to mix MultithreadedMapper with MultipulOutput, unless you implement you own MultithreadedMapper inspired class.

    这篇关于如何在单个JVM中运行hadoop多线程方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆