如何在单个JVM中运行hadoop多线程方式? [英] How to run hadoop multithread way in single JVM?
问题描述
我有4个核心桌面,并且希望使用我所有的核心与hadoop进行本地数据处理。
(即有时我有足够的能力在本地处理数据,有时我会将相同的作业提交给群集)。
$ b 默认情况下,hadoop本地模式只运行一个映射器和一个减速器,所以我的本地工作真的很慢。
我不想首先在单机上设置集群,因为痛苦的配置,其次我必须每次创建jar。因此,完美的解决方案是如何在单个机器上运行嵌入式Hadoop。
PS伪分布式模式是不好的选择,因为它将创建具有单个节点的群集,所以我将只获得一个映射器,我不得不花费一些时间进行额外的配置。
/hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultithreadedMapRunner.htmlrel =nofollow> MultithreadedMapRunner - 只需在JobConf的setMapRunnerClass方法中设置它, t忘记将mapred.map.multithreadedrunner.threads设置为期望的并发级别。另外还有一种方法,您应该: 但是要小心,您的映射器类应该是线程安全的,它的设置和清理方法会被多次调用,所以将MultithreadedMapper与MultipulOutput混合并不是一个聪明的主意,除非您实现了您自己的MultithreadedMapper灵感类。 I have 4 core desktop and want to use all my cores for local data processing with hadoop.
(i.e. sometimes I have enough power to process data locally sometimes I submit same jobs to cluster). By default hadoop local mode runs only one mapper and one reducer so my local jobs are really slow.
I do not want to setup cluster on single machine first because of "painful" configuration and second I have to create jar each time. So perfect solution is to how run embedded Hadoop on a single machine PS pseudo-distributed mode is bad option since it will create cluster with Single node, so I will get only one mapper and I have to spend some time on additional configuration. You need to use MultithreadedMapRunner - just set up it in JobConf's setMapRunnerClass method and don't forget to set mapred.map.multithreadedrunner.threads to desirable concurrency level. Also there is an another way, you should: But be careful, your mapper class should be thread safe and it's setup and cleanup methods would be called several times, so it isn't a smart idea to mix MultithreadedMapper with MultipulOutput, unless you implement you own MultithreadedMapper inspired class. 这篇关于如何在单个JVM中运行hadoop多线程方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
MultithreadedMapper.setMapperClass
和你在一起实际的映射类
MultithreadedMapper.setNumberOfThreads
具有理想的并发级别
MultithreadedMapper.setMapperClass
with you actual mapper classMultithreadedMapper.setNumberOfThreads
with desirable concurrency level