MapReduce 作业输出排序顺序 [英] MapReduce job Output sort order

查看：23 发布时间：2021/12/15 18:56:33 hadoop mapreduce

本文介绍了MapReduce 作业输出排序顺序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可以在我的 mapreduce 作业中看到，reducer 部分的输出按键排序..

i can see in my mapreduce jobs that the output of the reducer part is sorted by key ..

因此，如果我将减速器的数量设置为 10，则输出目录将包含 10 个文件，并且每个输出文件都有一个已排序的数据.

so if i have set number of reducers to 10, the output directory would contain 10 files and each of those output files have a sorted data.

我把它放在这里的原因是，即使所有文件都对数据进行了排序，但这些文件本身并没有被排序.例如:假设我使用 Text 作为键，则在某些情况下，part-000* 文件从 0 开始并以 zzzz 结束.

the reason i am putting it here is that even though all the files have sorted data but these files itself are not sorted.. for example : there are scenarios where the part-000* files have started from 0 and end at zzzz assuming i am using Text as the key.

我假设即使在文件中也应该对文件进行排序，即文件 1 应该有一个，最后一个文件部分--00009 应该有带有 zzzz 或 atleaset > a 的条目

i was assumming that the file's should be sorted even within the files i.e file 1 should have a and the last file part--00009 should have entries with zzzz or atleaset > a

假设我有所有字母均匀分布的键.

assuming if i have all the alphabets uniformally distributed keys.

有人可以解释一下为什么会有这种行为

could someone throw some light why such a behavior

MapReduce 作业输出排序顺序 [英] MapReduce job Output sort order

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

MapReduce 作业输出排序顺序 [英] MapReduce job Output sort order

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭