使用特定的连接节点url在hadoop集群中启动h2o [英] Starting h2o in hadoop cluster with specific connection node url

查看:75
本文介绍了使用特定的连接节点url在hadoop集群中启动h2o的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在集群的特定节点上启动h2o实例接口?例如...

Is there a way to start an h2o instance interface on a specific node of a cluster? For example...

使用命令时:

$ hadoop jar h2odriver.jar -nodes 4 -mapperXmx 6g -output hdfsOutputDir

在h2o安装目录中的节点172.18.4.62中,例如,我得到了(精简的)输出:

from say in the h2o install directory, in say node 172.18.4.62, I get the (abridged) output:

....
H2O node 172.18.4.65:54321 reports H2O cluster size 1
H2O node 172.18.4.66:54321 reports H2O cluster size 1
H2O node 172.18.4.67:54321 reports H2O cluster size 1
H2O node 172.18.4.63:54321 reports H2O cluster size 1
H2O node 172.18.4.63:54321 reports H2O cluster size 4
H2O node 172.18.4.66:54321 reports H2O cluster size 4
H2O node 172.18.4.67:54321 reports H2O cluster size 4
H2O node 172.18.4.65:54321 reports H2O cluster size 4
H2O cluster (4 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)

Open H2O Flow in your web browser: http://172.18.4.65:54321

(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...

然后从要连接到h2o实例的python脚本中,我将执行以下操作:

And from a python script that wants to connect to the h2o instance, I would do something like:

h2o.init(ip="172.18.4.65")

连接到h2o实例.但是,最好能够控制h2o实例连接所在的地址.

to connect to the h2o instance. However, it would be better to be able to control which address the h2o instance connection sits at.

有没有办法做到这一点?这个问题感到困惑/头脑错了吗?我的总体目标是定期运行python脚本,启动h2o集群,在该集群上执行操作,然后关闭该集群(无法知道用于连接该集群的地址意味着该脚本永远无法确定哪个脚本要连接的地址).任何意见,将不胜感激.谢谢.

Is there a way to do this? Is this question confused/wrong-headed? My overall goal is to have the python script run periodically, start an h2o cluster, do stuff on that cluster then shut the cluster down (not being able to know the address to use to connect to the cluster means the script would never be sure which address to connect to). Any advice would be appreciated. Thanks.

推荐答案

当您在Hadoop上启动H2O集群时,如下所示:

When you start H2O cluster on Hadoop as below:

$ hadoop jar h2odriver.jar -nodes 3 -mapperXmx 10g -output /user/test

在执行命令后,您将获得如下输出:

You will get an output as below just after the command is executed:

Determining driver host interface for mapper->driver callback...
    [Possible callback IP address: x.x.x.217]
    [Possible callback IP address: 127.0.0.1]
Using mapper->driver callback IP address and port: x.x.x.217:39562

(You can override these with -driverif and -driverport/-driverportrange.)

如您所见,hadoop运行时选择了回调IP地址.因此,在大多数情况下,Hadoop运行时会选择IP地址和端口以找到最佳的可用地址,

As you can see the callback IP address is selected by the hadoop runtime. So in most of the cases the IP address and the port is select by the Hadoop run time to find best available,

您还可以看到将-driverif x.x.x.x -driverport NNNNN与hadoop命令一起使用的选项,但是我不确定这是否真的是不错的选择.除了启动群集的节点ip之外,我还没有进行过测试,但是它确实可以从启动命令的IP处工作.

You can also see the option of using -driverif x.x.x.x -driverport NNNNN along with hadoop command however I am not sure if this is really the good option. I haven't tested it besides the node ip which I am launching the cluster but it does work from the IP where the command it launched.

根据我的经验,在Hadoop上启动H2O集群的最流行方法是让Hadoop决定集群,他们只需要解析链接的输出,如下所示:

Based on my experience, the most popular way to start H2O cluster on Hadoop is to let the Hadoop decide the cluster, they just need to parse the output of the link as below:

Open H2O Flow in your web browser: x.x.x.x:54321

解析上一行以获取要从R/Python API连接的驱动程序的IP地址/端口.

Parse the above line to get the IP address/port of the driver to connect from R/Python API.

这篇关于使用特定的连接节点url在hadoop集群中启动h2o的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆