在mapreduce中如何根据值对中间输出进行排序? [英] In mapreduce how to sort intermediate output based on values?

查看:239
本文介绍了在mapreduce中如何根据值对中间输出进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

How to sort intermediate output based on values in MapReduce ?





我尝试了什么:





What I have tried:

How to sort intermediate output based on values in MapReduce?

推荐答案

"The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting. There are 2 approaches to fulfill the same.

1. If reducers will get all the value for a particular key and buffer them all. Then we can do an in-reducers sort based on value. But this is not a good approach reducer will be receiving all the values for the key and there might be a chance that reducer will go with out of memory. But this can work well for the lesser data.

2. The next approach is to create a composite key which is having 2 values, Natural Key, and Natural values, where the natural key will be used for partitioning and value will be used for sorting. This is the best approach as it will not turn out to out of memory error. We will be writing the partitioner code just to make sure that all data with the same key go to the same reducer and data arrives at reducer is grouped by the natural key.
"


这篇关于在mapreduce中如何根据值对中间输出进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆