Hadoop 如何执行输入拆分? [英] How does Hadoop perform input splits?

查看：32 发布时间：2021/12/15 18:28:30 hadoop mapreduce hdfs

本文介绍了Hadoop 如何执行输入拆分?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个涉及 Hadoop/HDFS 的概念性问题.假设您有一个包含 10 亿行的文件.并且为了简单起见，让我们考虑每一行的形式 <k,v> 其中 k 是该行从开头的偏移量，value 是该行的内容.

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the form <k,v> where k is the offset of the line from the beginning and value is the content of the line.

现在，当我们说要运行 N 个 map 任务时，框架是否将输入文件拆分为 N 个拆分并在该拆分上运行每个 map 任务?或者我们是否必须编写一个分区函数来执行 N 个拆分并在生成的拆分上运行每个映射任务?

Now, when we say that we want to run N map tasks, does the framework split the input file into N splits and run each map task on that split? or do we have to write a partitioning function that does the N splits and run each map task on the split generated?

我想知道的是，拆分是在内部完成还是我们必须手动拆分数据?

All i want to know is, whether the splits are done internally or do we have to split the data manually?

更具体地说，每次调用 map() 函数时，它的 Key key 和 Value val 参数是什么?

More specifically, each time the map() function is called what are its Key key and Value val parameters?

谢谢，迪帕克

Hadoop 如何执行输入拆分? [英] How does Hadoop perform input splits?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Hadoop 如何执行输入拆分? [英] How does Hadoop perform input splits?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭