Hadoop分配差异 [英] Hadoop Distribution Differences

查看:86
本文介绍了Hadoop分配差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以概述各种可用的Hadoop发行版之间的差异:



使用Apache Hadoop发行版作为基准。



在标准Apache Hadoop发行版中使用这些发行版之一是否有一个很好的理由?b $ b



Yahoo发行版是Hadoop 20的一个版本,它们运行(运行)在他们集群的某些子集上。它包含一组稳定性修补程序,错误修复程序等。它是一个源代码版本;它没有像rpm或debian软件包等管理友好功能。



Cloudera发行套件是rpms和debs(源代码也可用)。这意味着您可以通过标准方法等获得更新。它还包括稳定性和错误修复补丁。它不断被维护(并不是说雅虎不是 - 我想我可以继续在github上查看他们上次更新它的时间)。它还包含Pig和Hive。



Cloudera的Hadoop 20发行版处于测试阶段,18年被认为是稳定的(更多内容请见 Cloudera博客)。 18版本还包括Hive和Pig的软件包; 20岁时,你必须自己构建它们(目前还没有官方发布支持20种猪的Pig或Hive,尽管存在修补程序)。 Cloudera和雅虎版本20之间可能会有重大的重叠;都提供清单,所以你可以检查。 Cloudera发行版的最新文档位于 http://archive.cloudera.com



雅虎不为其分销提供支持;他们将补丁版本作为服务提供给社区,因此有兴趣的人可以构建雅虎内部运行的内容。考虑到雅虎集群的规模,这是一个重大贡献,尤其是如果您不是始终关注JIRA的Hadoop开发人员。 Cloudera支持其商业发行版,并通过Hadoop邮件列表提供一些社区支持,对于发行版特定问题,在其GetSatisfaction页面上提供一些社区支持。



两者都非常不同从香草Apache发行版开始,因为它们在发行版之间进行了修补(20版的cloudera版本有60多个修补程序!)。

Can somebody outline the various differences between the various Hadoop Distributions available:

using the Apache Hadoop distro as a baseline.

Is there a good reason to using one of these distributions over the standard Apache Hadoop distro?

解决方案

Disclaimer: I interned at Cloudera this summer (but some of my best friends are at Yahoo! :-))

The Yahoo distribution is a version of Hadoop 20 that they run (ran?) on some subset of their clusters. It includes a set of patches for stability, bug fixes, etc. It is a source release; it does not have admin-friendly features like rpm or debian packages, etc.

The Cloudera distribution is packages as rpms and debs (the source is also available). This means you can get updates via standard methods, etc. It also includes stability and bug fix patches. It is constantly maintained (not to say Yahoo's isn't -- I suppose one could just go on github and check when they last updated it). It also packages Pig and Hive.

Cloudera's distribution of Hadoop 20 is in beta, and 18 is considered stable (more on this on the Cloudera blog). The 18 version also includes packages for Hive and Pig; for 20, you have to build them yourself (there aren't official releases of Pig or Hive that support 20 yet, although patches exist). There may well be significant overlap between the Cloudera and Yahoo versions of 20; both provide manifests, so you can check. The latest documentation of Cloudera's distros is at http://archive.cloudera.com

Yahoo does not provide support for their distribution; they provide their patched version as a service to the community, so the folks who are interested can build what Yahoo runs internally. Given the size of Yahoo clusters, that's a significant contribution, especially if you aren't a Hadoop developer who follows the JIRAs all the time. Cloudera supports their distribution commercially, as well as providing some community support via the Hadoop mailing lists and, for distro-specific issues, on their GetSatisfaction page.

Both are pretty different from the vanilla Apache distro since they patch it in between releases (the cloudera version of 20 has 60+ patches!).

这篇关于Hadoop分配差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆