如何使用Docker部署分布式H2O流集群? [英] How to deploy distributed h2o flow cluster with docker?

查看:275
本文介绍了如何使用Docker部署分布式H2O流集群?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够使用ec2实例部署h2o集群,并在平面文件中拥有私有ip。对docker进行相同的操作,但我无法弄清楚要输入平面文件中的内容,以便他们创建集群。容器正在运行的专用IP无法正常工作

I'm able to deploy a h2o cluster with ec2 instances and having the private ip in the flatfile. Doing the same with docker works but I can't figure out what to enter into the flatfile so they can create the cluster. Private IP the container is running on is not working

推荐答案

最终,在docker中运行H2O的解决方案可能是使用网络像 weave 这样的插件,因为编织可以使用多播(与docker overlay不同)。

Ultimately, the solution for running H2O in docker may be to use a network plugin like weave, because weave can use multicasting (unlike docker overlay).

但是我设法破解了一个在覆盖网络和平面文件上的docker swarm中运行H2O的解决方案。在swarm中运行的问题是docker为每个H2O实例分配了两个IP地址:一个在实例中可解析为stack_service,另一个被视为$ HOSTNAME。 H2O需要使用$ HOSTNAME IP,但是很难为平面文件预先确定此IP。因此,在每个实例中启动H2O之前,请传递一个带有stack_service名称的配置文件,然后使用脚本将其更改为IP地址。

But I managed to hack together a solution for running H2O in docker swarm on an overlay network and a flatfile. The issue with running in swarm is that docker assigns each H2O instance two IP addresses: one resolvable as the stack_service and the other seen as $HOSTNAME from within the instance. H2O needs to use the $HOSTNAME IP, but it is difficult to determine this IP in advance for the flatfile. So instead, pass a config file with the stack_service names and then change them to IP addresses using a script before launching H2O in each instance.

因此,例如,使用定义三个服务的docker-compose文件:

So, for example, use a docker-compose file that defines three services:

services:
  h2o_worker1:
    image: [h2o image]
    configs:
      - source: flatfile
        target: /flatfile
    deploy:
      placement:
        constraints:
          - node.hostname == [node1]
    ... 
  h2o_worker2:
    image: [h2o image]
    configs:
      - source: flatfile
        target: /flatfile
    deploy:
      placement:
        constraints:
          - node.hostname == [node1]
    ... 
  h2o_worker3:
    image: [h2o image]
    configs:
      - source: flatfile
        target: /flatfile
    deploy:
      placement:
        constraints:
          - node.hostname == [node1]
    ... 

##### Configs #####
configs:
  flatfile:
    file: flatfile

.. 。是您需要输入的其他docker compose参数,[]代表您需要为安装程序定义的内容。

Where ... is other docker compose parameters you need to enter, and [] represents things you need to define for your setup.

现在,根据服务名称创建一个平面文件,可以通过配置导入:

Now create a flatfile based on the service names that will be imported by the config:

h2o_worker1:54321
h2o_worker2:54321
h2o_worker3:54321

很明显,如有必要,请更改端口。然后使用入口点脚本查找每个服务名称的IP,然后加1以获取每个服务的$ HOSTNAME IP。我只是在这里使用sleep来确保所有服务都已启动,以便IP查找工作。 Docker似乎总是为每个服务依次分配两个IPS,但是YMMV。如我所说,这是一个hack,可能不是一个很好的生产级解决方案。我的入口点脚本如下所示:

Obviously, change the ports if necessary. Then use an entrypoint script to lookup each service name's IP, and then add 1 to get the $HOSTNAME IP for each service. I just use sleep here to make sure all the services have started so that the IP lookup works. Docker always appears to assign the two IPS per service sequentially, but YMMV. As I said, this is a hack and probably not a great production-level solution. My entrypoint script looks something like this:

echo "Moving flatfile to ${H2O_HOME}"
cp /flatfile ${H2O_HOME}

sleep 60
echo "Replacing hostnames in flatfile with IP addresses."
grep -o -P '.*(?=:)' ${H2O_HOME}/flatfile > ${H2O_HOME}/hostnames
grep -o -P '(?<=:).*' ${H2O_HOME}/flatfile > ${H2O_HOME}/ports
dig +short $(cat ${H2O_HOME}/hostnames) > ${H2O_HOME}/hostnames_ip
cat ${H2O_HOME}/hostnames_ip | awk -F"." '{printf "%d.%d.%d.%d\n", $1, $2, $3, $4 + 1}' > ${H2O_HOME}/new_ips
paste -d ":" ${H2O_HOME}/new_ips ${H2O_HOME}/ports > ${H2O_HOME}/new_flatfile

echo "Starting H2O..."
bash -c "java -Xmx${H2O_NODE_MEMORY:-1g} -jar ${H2O_HOME}/h2o.jar -flatfile ${H2O_HOME}/new_flatfile"

此处的关键是使用dig检索IP地址每个服务主机,然后加1以获得我们需要传递给H2O的辅助地址。注意我在Dockerfile中定义了一个环境变量,因此我可以在docker compose文件中更改节点内存。您不需要这样做。 Dockerfile还为H2O的安装位置设置了一个变量,以简化操作。

The key here is using dig to retrieve the IP addresses for each service host, and then incrementing by one to get the secondary address that we need to pass to H2O. Note I define an environment variable in my Dockerfile so I can vary the node memory in the docker compose file. You don't need to do that. And the Dockerfile also sets a variable for the install location for H2O, to simplify things.

这使我可以使用docker swarm部署容器,而H2O实际上可以正确找到所有节点。由于H2O不允许在初始设置后添加或删除节点,因此(至少对我而言)预先定义大部分不是什么大问题。话虽如此,我还是可以尝试编织或使用其他避免这些问题的网络插件。

This lets me deploy the containers using docker swarm, and H2O in fact finds all the nodes correctly. Because H2O does not permit additions or deletions of nodes after the initial setup, it is not a big deal (at least for me) to define most of this in advance. That said, I may yet try to move to weave or another network plugin that avoids some of these issues.

这篇关于如何使用Docker部署分布式H2O流集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆