使用mapred或mapreduce包来创建Hadoop作业会更好吗? [英] Is it better to use the mapred or the mapreduce package to create a Hadoop Job?

查看:81
本文介绍了使用mapred或mapreduce包来创建Hadoop作业会更好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要创建MapReduce作业,您可以使用旧的 org.apache.hadoop.mapred 包或更新的 org.apache.hadoop.mapreduce Mappers and Reducers,Jobs的软件包...第一个已被标记为不推荐使用,但同时它被恢复。现在我想知道使用旧的mapred软件包还是新的mapreduce软件包来创建工作以及为什么更好。还是只是取决于你是否需要诸如MultipleTextOutputFormat之类的东西,它只能在旧的mapred包中使用? 解决方案

功能明智旧的( oahmapred )和新的( oahmapreduce )API没有太大区别。唯一重要的区别是记录被推送到旧API中的映射器/缩减器。而新的API支持拉/推机制。您可以在此处获取有关拉取机制的更多信息。 。

此外,旧的API已被自0.21以来未被弃用。您可以找到有关新API的更多信息 here

正如您所提到的,一些类(如MultipleTextOutputFormat)尚未迁移到新API,根据这一点和上述原因,最好坚持使用旧的API(尽管翻译通常非常简单)。


To create MapReduce jobs you can either use the old org.apache.hadoop.mapred package or the newer org.apache.hadoop.mapreduce package for Mappers and Reducers, Jobs ... The first one had been marked as deprecated but this got reverted meanwhile. Now I wonder whether it is better to use the old mapred package or the new mapreduce package to create a job and why. Or is it just dependent on whether you need stuff like the MultipleTextOutputFormat which is only available in the old mapred package?

解决方案

Functionality wise there is not much difference between the old (o.a.h.mapred) and the new (o.a.h.mapreduce) API. The only significant difference is that records are pushed to the mapper/reducer in the old API. While the new API supports both pull/push mechanism. You can get more information about the pull mechanism here.

Also, the old API has been un-deprecated since 0.21. You can find more information about the new API here.

As you mentioned some of the classes (like MultipleTextOutputFormat) have not been migrated to the new API, due to this and the above mentioned reason it's better to stick to the old API (although a translation is usually quite simple).

这篇关于使用mapred或mapreduce包来创建Hadoop作业会更好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆