在AWS EC2上使用h2o进行多节点集群安装 [英] Multi node cluster installation with h2o on AWS EC2

查看:184
本文介绍了在AWS EC2上使用h2o进行多节点集群安装的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何使用多个AWS EC2实例和R-Studio设置h2o集群.我不是计算机科学家,所以对这个琐碎的问题感到抱歉(!)

I was wondering about how to set up a h2o cluster using multiple AWS EC2 instances and R-Studio. I am not a computer scientist, so sorry for the trivial questions (!)

基于本教程( http://amunategui.github.io/h2o-on- aws/)我已在AWS EC2实例(Linux)上成功安装了h2o和R-Studio.但是我宁愿创建一个多实例集群,每个集群有4个实例,每个实例有8个核心.

Based on this tutorial (http://amunategui.github.io/h2o-on-aws/) I sucessfully installed h2o and R-Studio on an AWS EC2 instance (Linux). But I rather want to create a multi-instance cluster with lets say 4 instance with 8 cores each.

遵循此操作( http ://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/deployment/multinode.html )文档中,我需要一个flatfile.txt文件,其中可以列出所有IP和每个EC2实例的端口.下一步,我必须将此文件复制到集群中的每个节点,然后需要通过java命令行启动集群...由于我不是我已经提到的计算机科学家,因此出现了一些问题:/p>

Following this (http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/deployment/multinode.html) document, I need a flatfile.txt where I can list all IPs and ports of each EC2 instance. In a next step, I have to copy this file to each node in the cluster and afterwards I need to start a cluster via the java command line... Since I am not a computer scientist as I already mentioned, some questions emerged:

  1. 在哪里可以找到每个h2o实例的IP和端口?
  2. 我如何精确地将生成的文件复制到每个节点?
  3. 从步骤5开始,我完全感到困惑;我必须在哪里插入此行/在哪里可以找到java comand行?
  4. 我不想使用h2o的Web UI,那么如何从R-Studio(安装在其中一个实例上)访问群集?

非常感谢您!

推荐答案

1a.从哪里获得IP? 创建每个EC2实例时,您会被告知.它是您想要的私有IP(通常从172开始.) (顺便说一句,请确保在相同的可用区域中全部创建它们.)

1a. Where to get the IPs? You get told them as you create each EC2 instance. It is the private IP you want (normally starting with 172.) (BTW, make sure you create them all in the same availability zone.)

1b.使用54321作为端口.因此,用于3个节点的flatfile.txt可能如下所示:

1b. Use 54321 as the port. So your flatfile.txt for 3-nodes might look like:

172.31.1.123:54321
172.31.2.237:54321
172.44.99.99:54321

_2.您可以在笔记本上创建flatfile.txt,然后将其scp到主目录中的每个节点. (将公共IP用于scp.)

_2. You might make the flatfile.txt on your notebook, then scp it to each node, in your home directory. (Use the public IP for scp.)

_3.依次ssh进入每台计算机,然后从主目录(例如)

_3. ssh in to each machine in turn, and then type that command, from the home directory, E.g.

 java -Xmx20g -jar h2o.jar -flatfile flatfile.txt -port 54321

_4.首先,请确保您的Amazon防火墙(也称为安全组")中的端口8787已打开.一旦确定H2O集群正在运行(并假设已安装H2O R软件包,并确保它与集群中每个节点的版本完全相同),您就可以执行以下操作:

_4. First make sure port 8787 is open in your Amazon firewall (aka "security group"). Once you've made sure the H2O cluster is running (and assuming you have installed the H2O R package, and made sure it is exactly the same version as on each node in your cluster) then you simply do:

library(h2o)
h2o.init()

h2o.init()在本地计算机上查找群集中的任何节点.

The h2o.init() looks on the local machine for any node in the cluster.

旁边:

我一直在使用的脚本是在这里找到的:

What I have been using are the scripts found here:

https://github.com/h2oai/h2o-3/tree/master/ec2

它们几乎为您完成了所有步骤,包括制作平面文件,分发平面文件以及在每个节点上启动H2O.您仍然需要设置一个安全组(好吧,我想:脚本默认是没有安全组!),并且需要为用来登录RStudio的用户设置密码.并且您需要安装H2O R软件包(如果您不喜欢命令行,我可以在RStudio内完成此工作).

They do almost all the steps for you, including making the flatfile, distributing it, and starting H2O on each node. You still need to set up a security group (well, optionally, I suppose: the script default is to have no security group!), and you need to set a password for the user you will use to login to RStudio with. And you need to install the H2O R package (I think that could be done from inside RStudio, if you have an aversion to the commandline).

这篇关于在AWS EC2上使用h2o进行多节点集群安装的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆