Docker容器中的Hadoop是否更快/值得吗? [英] Is Hadoop in Docker container faster/worth it?

查看:421
本文介绍了Docker容器中的Hadoop是否更快/值得吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Hadoop 的环境。我在这个系统中使用 Flume Hue Cassandra 现在有一个围绕 Docker 的大炒作,所以想在这种情况下检查码头化的优缺点。我认为它应该更加便携,但可以使用 Cloudera Manager 点击几下设置。它可能更快还是为什么值得吗?有什么优点?
也许应该只有多个节点 Cassandra cluster dockerized?

解决方案

blockquote>

可能是更快还是为什么值得吗?


听起来你已经有一个Hadoop簇。所以你必须问自己,重现这个环境需要多长时间?您需要多长时间重复一次这样的环境?



如果您不需要重复重复使用环境的方法,并且包含可能与其他应用程序冲突的依赖关系主持人,那么我还没有看到你的用例。


有什么优点?


如果您在可能需要混合Java版本的环境中运行Hadoop,则将其作为容器运行可以从主机中隔离依赖关系(在本例中为Java)系统。在某些情况下,它会让您更容易复制的工件移动和设置。但是Java应用程序已经非常简单,所有的依赖项都包含在JAR中。


也许应该只有多个节点Cassandra集群停靠? p>

我不认为这是否是多节点环境。它归结于它解决的问题。这听起来不像您在部署或复制Hadoop环境中有任何痛苦,但是我并没有看到需要dockerize的东西,只因为它是块中的热门新事物。



当您确实需要轻松重现Hadoop环境时,您可以查看Docker的一些编排和管理工具(Kubernetes,Rancher等),从而进行部署和管理覆盖网络上的应用集群比普通Docker更加开胃。 Docker只是我眼中的工具。当您可以利用其他软件包建立在其上的一些整洁的多主机网络,发现和编排时,它真的开始闪耀。


I have a Hadoop based environment. I use Flume, Hue and Cassandra in this system. There is a big hype around Docker nowadays, so would like to examine, what are pros and cons in dockerization in this case. I think it should be much more portable, but it can be set using Cloudera Manager with a few clicks. Is it maybe faster or why is worth it? What are advantages? Maybe should be only multi node Cassandra cluster dockerized?

解决方案

Is it maybe faster or why is worth it?

It sounds like you already have a Hadoop cluster. So you have to ask yourself, how long does it take to reproduce this environment? How often do you need to reproduce this environment?

If you are not needing a way to reproduce the environment repeatedly and and contain dependencies that may be conflicts with other applications on the host, then I don't yet see a use case for you.

What are advantages?

If you are running Hadoop in an environment where you may need mixed Java versions, then running it as a container could isolate the dependencies (in this case, Java) from the host system. In some case, it would get you a more easily reproducible artifact to move around and set up. But Java apps are already so simple with all their dependencies included in the JAR.

Maybe should be only multi node Cassandra cluster dockerized?

I don't think it really comes down to whether is is a multi-node environment or not. It comes down to the problems it solves. It doesn't sound like you have any pain point in deploying or reproducing Hadoop environments (yet), so I don't see the need to "dockerize" something just because it is the hot new thing on the block.

When you do have the need to reproduce the Hadoop environment easily, you might look at Docker for some of the orchestration and management tools (Kubernetes, Rancher, etc.) which make deploying and managing clusters of applications on an overlay network much more appetizing than just regular Docker. Docker is just the tool in my eyes. It really starts to shine when you can leverage some of the neat overlay multi-host networking, discovery, and orchestration that other packages are building on top of it.

这篇关于Docker容器中的Hadoop是否更快/值得吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆