Hadoop虚拟集群vs单机 [英] Hadoop virtual cluster vs single machine

查看:152
本文介绍了Hadoop虚拟集群vs单机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于速度&性能
在一台机器上使用多个虚拟化节点,在单台机器上使用单个节点。



哪一个性能更好?

我之所以问这个问题,是因为我目前在一台机器上学习hadoop,并且在互联网上看到一些教程,显示了多个虚拟节点在一台机器上。

预先感谢您

解决方案

有总是有一些虚拟化带来的开销,所以除非真正需要,否则我不会建议在虚拟化环境中运行Hadoop。



这就是说,我知道VMWare做了一个在Hadoop在虚拟化环境中工作方面做了大量工作,并且已经发布了一些基准测试,他们声称在某些情况下本地应用程序的虚拟机具有更好的性能。我对vSphere并没有多大用处,但如果您想进一步探索虚拟化,可以考虑一下。但是不要把这些数字视为理所当然,这取决于你正在运行的硬件类型,所以在某些情况下,我认为你可能会获得虚拟机的一些性能,但我从经验中猜测,在大多数情况下,你赢了如果你刚开始使用Hadoop进行测试,我认为虚拟化是过度的。您可以非常方便地以伪分布模式运行Hadoop,这意味着您可以在同一个框中运行多个Hadoop守护进程,每个Hadoop守护进程都是一个单独的进程。这就是我以前开始使用Hadoop的过程,这是一个良好的开端。您可以在此处找到更多信息(或者可能需要另一个页面您正在运行的Hadoop版本)。



如果您想要使用真正的群集进行测试,但没有资源,我会建议看亚马逊弹性地图/减少:它给你一个按需集群,它很便宜。这样你可以做更高级的测试。更多信息此处



底线是的,我认为如果目的仅仅是测试,你并不需要虚拟集群。


I have a question regarding speed & performance of using multiple virtualized nodes in a single machine VS single node on the single machine itself.

which one will perform better?

The reason why I ask this question is because I am currently learning hadoop on a single machine, and I see some tutorials on the internet that shows the use of multiple virtualized nodes in a single machine.

Thank you in advance

解决方案

There is always some overhead that comes with virtualization, so unless really necessary I wouldn't advise to run Hadoop in a virtualized environment.

That being said, I know VMWare did a lot of work on making Hadoop work in a virtualized environment, and they have published some benchmarks in which they claim under certain conditions to have better performance with VMs that a native application. I haven't played much with vSphere, but this could be something to look at if you want to explore virtualization further. But don't take the numbers for granted, it really depends on the type of hardware you're running, so in some conditions I think you might gain some performance with VMs, but I'm guessing from experience that in most cases you won't gain anything.

If you're just getting started and testing with Hadoop, I think virtualizing is overkill. You can very easily run Hadoop in pseudo-distributed mode, which means that you can run multiple Hadoop daemons on the same box, each as a separate process. That's what I used to get started with Hadoop, and it's a good head start. You can find more info here (or might need another page depending on which Hadoop version you're running).

If you get to the point where you want to test with a real cluster, but don't have the resources, I would advise looking at Amazon Elastic Map/Reduce: it gives you a cluster on demand and it's pretty cheap. That way you can do more advanced tests. More info here.

the bottom line is, I think if the purpose is simply testing, you don't really need a virtual cluster.

这篇关于Hadoop虚拟集群vs单机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆