Hadoop 分区器 [英] Hadoop partitioner

查看:15
本文介绍了Hadoop 分区器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问一下 Hadoop 分区器,它是在 Mappers 中实现的吗?如何衡量使用默认哈希分区器的性能 - 是否有更好的分区器来减少数据倾斜?

I want to ask about Hadoop partitioner ,is it implemented within Mappers?. How to measure the performance of using the default hash partitioner - Is there better partitioner to reducing data skew?

谢谢

推荐答案

Partitioner 是介于 Mappers 和 Reducers 之间的关键组件.它在 Reducer 之间分发地图发出的数据.

Partitioner is a key component in between Mappers and Reducers. It distributes the maps emitted data among the Reducers.

Partitioner 在每个 Map Task JVM(java 进程)中运行.

Partitioner runs within every Map Task JVM (java process).

默认的分区器HashPartitioner基于Hash函数工作,与TotalOrderPartitioner等其他分区器相比,它的速度非常快.它在每个地图输出键上运行哈希函数,即:

The default partitioner HashPartitioner works based on Hash function and it is very faster compared other partitioner like TotalOrderPartitioner. It runs hash function on every map output key i.e.:

Reduce_Number = (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;

要检查 Hash Partitioner 的性能,请使用 Reduce 任务计数器并查看 reducer 之间的分布情况.

To check the performance of Hash Partitioner, use Reduce task counters and see how the distribution happened among the reducers.

Hash Partitioner 是基本的分区器,不适合处理高偏度的数据.

Hash Partitioner is basic partitioner and it doesn't suit for processing data with high skewness.

为了解决数据倾斜问题,我们需要编写自定义分区器类,从 MapReduce API 扩展 Partitioner.java 类.

To address the data skew problems, we need to write the custom partitioner class extending Partitioner.java class from MapReduce API.

自定义分区器的示例类似于 RandomPartitioner.这是在 reducer 之间均匀分布倾斜数据的最佳方法之一.

The example for custom partitioner is like RandomPartitioner. It is one of the best ways to distribute the skewed data among the reducers evenly.

这篇关于Hadoop 分区器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆