在多个CPU内核上运行独立Hadoop应用程序 [英] Running a standalone Hadoop application on multiple CPU cores

查看：218 发布时间：2016/12/20 15:19:29 java multithreading command-line hadoop mapreduce

本文介绍了在多个CPU内核上运行独立Hadoop应用程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的团队使用Hadoop库构建了一个Java应用程序，将一堆输入文件转换为有用的输出。
给定当前负载，单个多核服务器将在未来一年左右做好。我们还没有需要去一个多服务器Hadoop集群，但我们选择开始这个项目准备。

My team built a Java application using the Hadoop libraries to transform a bunch of input files into useful output. Given the current load a single multicore server will do fine for the coming year or so. We do not (yet) have the need to go for a multiserver Hadoop cluster, yet we chose to start this project "being prepared".

当我运行这个应用程序命令行（或在eclipse或netbeans），我还不能说服它使用更多的一个地图和/或减少线程一次。
由于这个工具是非常CPU密集型的，所以单线程是我目前的瓶颈。

When I run this app on the command-line (or in eclipse or netbeans) I have not yet been able to convince it to use more that one map and/or reduce thread at a time. Given the fact that the tool is very CPU intensive this "single threadedness" is my current bottleneck.

当在netbeans分析器中运行它时，应用程序为了各种目的启动多个线程，但只有一个map / reduce正在同一时间运行。

When running it in the netbeans profiler I do see that the app starts several threads for various purposes, but only a single map/reduce is running at the same moment.

输入数据由几个输入文件组成，因此Hadoop应该最少能够在映射阶段同时为每个输入文件运行1个线程。

The input data consists of several input files so Hadoop should at least be able to run 1 thread per input file at the same time for the map phase.

我可以做什么至少有2甚至4个活动线程运行在这个应用程序的大部分处理时间应该是可能的）

What do I do to at least have 2 or even 4 active threads running (which should be possible for most of the processing time of this application)?

我希望这是一个非常愚蠢的，我忽略了。

I'm expecting this to be something very silly that I've overlooked.

我发现这个： https://issues.apache.org/jira/browse/MAPREDUCE-1367
这实现了我在Hadoop中寻找的功能0.21
它介绍flag mapreduce.local.map.tasks.maximum来控制它。

I just found this: https://issues.apache.org/jira/browse/MAPREDUCE-1367 This implements the feature I was looking for in Hadoop 0.21 It introduces the flag mapreduce.local.map.tasks.maximum to control it.

现在我也找到了解决方案在这里问题。

For now I've also found the solution described here in this question.

推荐答案

我不确定我是否正确，但是当您在本地模式下运行任务时，有多个mappers / reducers。

I'm not sure if I'm correct, but when you are running tasks in local mode, you can't have multiple mappers/reducers.

无论如何，要设置最大数量的运行mappers和reducers使用配置选项 mapred.tasktracker.map.tasks.maximum 和 mapred.tasktracker.reduce.tasks.maximum 默认情况下，这些选项设置为 2 ，所以我可能是对的。


Anyway, to set maximum number of running mappers and reducers use configuration options mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum by default those options are set to 2, so I might be right.
最后，如果你想准备多节点集群，直接运行这个完全分布式的方式，但有所有服务器namenode，datanode，tasktracker，jobtracker，...）在单个机器上运行
Finally, if you want to be prepared for multinode cluster go straight with running this in fully-distributed way, but have all servers (namenode, datanode, tasktracker, jobtracker, ...) run on a single machine

                        这篇关于在多个CPU内核上运行独立Hadoop应用程序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在多个CPU内核上运行独立Hadoop应用程序 [英] Running a standalone Hadoop application on multiple CPU cores

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在多个CPU内核上运行独立Hadoop应用程序 [英] Running a standalone Hadoop application on multiple CPU cores

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭