hadoop - 总映射器是如何确定的 [英] hadoop - how total mappers are determined

查看：137 发布时间：2018/5/31 20:14:12 hadoop hadoop-partitioning

本文介绍了hadoop - 总映射器是如何确定的的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是hadoop的新手，刚刚安装了oracle的virtualbox和hortonworks沙箱。然后，我下载了最新版本的hadoop，并将jar文件导入到我的java程序中。我复制了一个示例wordcount程序并创建了一个新的jar文件。我将这个jar文件作为使用沙箱的作业来运行。 wordcount按预期完美地工作。但是，在我的工作状态页面中，我看到输入文件中映射器的数量被确定为28.在我的输入文件中，我有以下行。

Ramesh正在学习XXXXXXXXXX XX XXXXX XX XXXXXXXXX。

总映射器是如何确定为28的？

我在wordcount.java程序中添加了下面这行代码以检查。

  FileInputFormat.setMaxInputSplitSize（job，2）;

另外，我想知道输入文件是否只能包含2行。（即）假设我有一个输入文件，如下所示。

row1，row2，row3，row4，row5，row6 ....... row20

我将输入文件分成20个不同的文件，每个文件只有2行？

解决方案

这意味着你的输入文件在HDFS中被拆分为大约28个部分（块） - 但是，尽管如此，不可能不是总共28个并行地图任务。并行性将取决于您在群集中拥有的插槽数量。我正在谈论Apache Hadoop。我不知道Horton的作品是否对此进行了修改。

Hadoop喜欢使用大文件，因此，是否要将输入文件分割为20个不同的文件档案？

I am new to hadoop and just installed oracle's virtualbox and hortonworks' sandbox. I then, downloaded the latest version of hadoop and imported the jar files into my java program. I copied a sample wordcount program and created a new jar file. I run this jar file as a job using sandbox. The wordcount works perfectly fine as expected. However, in my job status page, I see the number of mappers to my input file is determined as 28. In my input file, I have the following line.

Ramesh is studying at XXXXXXXXXX XX XXXXX XX XXXXXXXXX.

How is the total mappers determined as 28?

I added the below line into my wordcount.java program to check.

FileInputFormat.setMaxInputSplitSize(job, 2);

Also, I would like to know if the input file can contain only 2 rows. (i.e.) Suppose if I have an input file, like below.

row1,row2,row3,row4,row5,row6.......row20

Should I split the input file into 20 different files each having only 2 rows?

解决方案

That means your input file is split into roughly 28 parts(blocks) in HDFS since, you said 28 map tasks were scheduled- but, not may not be total 28 parallel map task though. Parallelism will depend on the number of slots you'll have in your cluster. I'm talking in terms of Apache Hadoop. I don't know if Horton works did nay modification to this.

Hadoop likes to work with Large files, so, do you want to split your input file to 20 different files?

这篇关于hadoop - 总映射器是如何确定的的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

hadoop - 总映射器是如何确定的 [英] hadoop - how total mappers are determined

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

hadoop - 总映射器是如何确定的 [英] hadoop - how total mappers are determined

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭