如何使HDFS在docker swarm中工作 [英] How to make HDFS work in docker swarm

查看:88
本文介绍了如何使HDFS在docker swarm中工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法使我的HDFS设置在docker swarm中工作.为了了解问题,我将设置减少到最低:

I have troubles to make my HDFS setup work in docker swarm. To understand the problem I've reduced my setup to the minimum :

  • 1台物理计算机
  • 1个名称节点
  • 1个数据节点

此设置在docker-compose上运行良好,但在docker-swarm上使用相同的撰写文件失败.

This setup is working fine with docker-compose, but it fails with docker-swarm, using the same compose file.

这是撰写文件:

version: '3'
services:
  namenode:
      image: uhopper/hadoop-namenode
      hostname: namenode
      ports:
        - "50070:50070"
        - "8020:8020"
      volumes:
        - /userdata/namenode:/hadoop/dfs/name
      environment:
        - CLUSTER_NAME=hadoop-cluster

  datanode:
    image: uhopper/hadoop-datanode
    depends_on:
      - namenode
    volumes:
      - /userdata/datanode:/hadoop/dfs/data
    environment:
      - CORE_CONF_fs_defaultFS=hdfs://namenode:8020

要对其进行测试,我仅在core-site.xml中使用此简单配置在主机(物理)计算机上安装了hadoop客户端:

To test it, I have installed an hadoop client on my host (physical) machine with only this simple configuration in core-site.xml :

<configuration>
  <property><name>fs.defaultFS</name><value>hdfs://0.0.0.0:8020</value></property>
</configuration>

然后我运行以下命令:

hdfs dfs -put test.txt /test.txt

使用docker-compose(只需运行docker-compose即可)运行,并且文件将以HDFS写入.

With docker-compose (just running docker-compose up) it's working and the file is written in HDFS.

使用docker-swarm,我正在运行:

With docker-swarm, I'm running :

docker swarm init 
docker stack deploy --compose-file docker-compose.yml hadoop

然后,当所有服务启动时,我将文件放在HDFS上,它会像这样失败:

Then when all services are up, I put my file on HDFS it fails like this :

INFO hdfs.DataStreamer: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/x.x.x.x:50010]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1692)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1648)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)
18/06/14 17:29:41 WARN hdfs.DataStreamer: Abandoning BP-1801474405-10.0.0.4-1528990089179:blk_1073741825_1001
18/06/14 17:29:41 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.0.0.6:50010,DS-d7d71735-7099-4aa9-8394-c9eccc325806,DISK]
18/06/14 17:29:41 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

如果我在Web UI中查看,则datanode似乎已启动,并且未报告任何问题...

If I look in the web UI the datanode seems to be up and no issue is reported...

更新:似乎swarm忽略了DependsOn,但这似乎不是我的问题的原因:当名称节点启动时,我已经重新启动了datanode,但效果不佳.

Update : it seems that dependsOn is ignored by swarm, but it does not seem to be the cause of my problem : I've restarted the datanode when the namenode is up but it did not work better.

感谢您的帮助:)

推荐答案

整个混乱源于使用覆盖网络的docker群之间的交互以及HDFS名称节点如何跟踪其数据节点.名称节点根据数据节点的覆盖网络IP记录数据节点IP/主机名.当HDFS客户端直接在数据节点上请求读/写操作时,名称节点将基于覆盖网络报告数据节点的IP/主机名.由于外部客户端无法访问覆盖网络,因此任何rw操作都将失败.

The whole mess stems from interaction between docker swarm using overlay networks and how the HDFS name node keeps track of its data nodes. The namenode records the datanode IPs/hostnames based the datanode's overlay network IPs. When the HDFS client asks for read/write operations directly on the datanodes, the namenode reports back the IPs/hostnames of the datanodes based on the overlay network. Since the overlay network is not accessible to the external clients, any rw operations will fail.

我使用的最终解决方案(经过大量努力使覆盖网络正常工作之后)是让HDFS服务使用主机网络.这是撰写文件的片段:

The final solution (after lots of struggling to get overlay network to work) I used was to have the HDFS services use the host network. Here's a snippet from the compose file:

version: '3.7'

x-deploy_default: &deploy_default
  mode: replicated
  replicas: 1
  placement:
    constraints:
      - node.role == manager
  restart_policy:
    condition: any
    delay: 5s

services:
  hdfs_namenode:
    deploy:
      <<: *deploy_default
    networks:
      hostnet: {}
    volumes:
      - hdfs_namenode:/hadoop-3.2.0/var/name_node
    command:
      namenode -fs hdfs://${PRIMARY_HOST}:9000
    image: hadoop:3.2.0

  hdfs_datanode:
    deploy:
      mode: global
    networks:
      hostnet: {}
    volumes:
      - hdfs_datanode:/hadoop-3.2.0/var/data_node
    command:
      datanode -fs hdfs://${PRIMARY_HOST}:9000
    image: hadoop:3.2.0
volumes:
  hdfs_namenode:
  hdfs_datanode:

networks:
  hostnet:
    external: true
    name: host

这篇关于如何使HDFS在docker swarm中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆