Hadoop集群配置与Ubuntu主站和Windows从站 [英] Hadoop cluster configuration with Ubuntu Master and Windows slave

查看:106
本文介绍了Hadoop集群配置与Ubuntu主站和Windows从站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



Hadoop版本(2.2.0)



目标:


  1. 设置Hadoop独立版 - Ubuntu 12 (完成)

  2. 安装Hadoop独立--Windows 7(cygwin仅用于sshd)(已完成)
  3. 安装有Ubuntu Master和Windows 7 slave的群集(主要用于学习目的和设置env开发)(阻塞)



  4. 设置与以下问题的关系: p>


    • 使用hadoop 2.2.0在Ubuntu上运行的主节点

    • 在Windows 7上运行的自编译版本来自hadoop 2.2.0源码。我使用cygwin仅用于sshd

    • 无密码登录设置,并且我可以使用来自外部hadoop的ssh
      的两种登录方式。由于我的Ubuntu和Windows机器有
      不同的用户名,我已经在.ssh文件夹
      中设置了一个配置文件,它将用户映射到用户



    问题:


    1. 在集群中,主服务器中的用户名必须与奴隶。我问这个问题的原因是当我尝试使用start-dfs.sh时,集群的后期配置日志表示它们能够ssh进入从节点,但无法找到位置/ home / xxx / hadoop / bin / hadoop-daemon.sh。 xxx是我的主用户名,而不是slaveone。另外,因为我的奴隶在纯Windows版本的安装位于C:/ hadoop / ...主人是否会查看env变量$ HADOOP_HOME以检查安装位置在从站中?是否还有其他env变量需要设置?


    2. 我的目标是使用Windows hadoop搭建从站,因为hadoop现在正式支持windows。但是,在cygwin下运行Linux版本来完成这一点会更好吗?问题出现了,因为我看到start-dfs.sh正在尝试执行hadoop-daemon.sh而不是某些* .cmd。


    3. 如果这安装程序将来会出现,我可能会遇到的问题是Pig,Mahout等是否会在这种设置中运行,因为我还没有看到过构建Pig,Mahout for Windows。这些组件是否仅需要存在于主节点上,还是需要存在于从节点中?我在使用mahout脚本试验独立模式时发现了两种运行mahout的方法,我可以在linux中使用mahout脚本,然后使用yarn jar命令,在使用windows版本的同时在mahout jar中传入。在这种情况下,Mahout / Pig(当使用提供的sh脚本时)会认为从站已经安装了jar,那么Ubuntu + Windows组合似乎不起作用。请咨询。


    正如我所提到的,这更像是一个实验而非实施计划。我们的最终环境将完全在Linux上。感谢您的建议。

    解决方案

    我只使用了相同的用户名。通常,SSH允许使用-l命令以不同的登录名登录。但是这可能会变得棘手。您必须将您的奴隶列入奴隶档案。



    至少在手册 https://hadoop.apache.org/docs/r0.19.1/cluster_setup.html#Slaves 我没有找到任何要添加用户名的东西。可能值得尝试将-l login_name添加到slave conf文件中的slavenode中,并查看它是否有效。


    Hi I am new to Hadoop.

    Hadoop Version (2.2.0)

    Goals:

    1. Setup Hadoop standalone - Ubuntu 12 (Completed)
    2. Setup Hadoop standalone - Windows 7 (cygwin being used for only sshd) (Completed)
    3. Setup cluster with Ubuntu Master and Windows 7 slave (This is mostly for learning purposes and setting up a env for development) (Stuck)

    Setup in relationship with the questions below:

    • Master running on Ubuntu with hadoop 2.2.0
    • Slaves running on Windows 7 with a self compiled version from hadoop 2.2.0 source. I am using cygwin only for the sshd
    • password less login setup and i am able to login both ways using ssh from outside hadoop. Since my Ubuntu and Windows machine have different usernames I have set up a config file in the .ssh folder which maps Hosts with users

    Questions:

    1. In a cluster does the username in the master need to be same as in the slave. The reason I am asking this is that post configuration of the cluster when I try to use start-dfs.sh the logs say that they are able to ssh into the slave nodes but were not able to find the location "/home/xxx/hadoop/bin/hadoop-daemon.sh" in the slave. The "xxx" is my master username and not the slaveone. Also since my slave in pure Windows version the install is under C:/hadoop/... Does the master look at the env variable $HADOOP_HOME to check where the install is in the slave? Is there any other env variables that I need to set?

    2. My goal was to use the Windows hadoop build on slave since hadoop is officially supporting windows now. But is it better to run the Linux build under cygwin to accomplish this. The question comes since I am seeing that the start-dfs.sh is trying to execute hadoop-daemon.sh and not some *.cmd.

    3. If this setup works out in future, a possible question that I have is whether Pig, Mahout etc will run in this kind of a setup as I have not seen a build of Pig, Mahout for Windows. Does these components need to be present only on the master node or do they need to be in the slave nodes too. I saw 2 ways of running mahout when experimenting with standalone mode first using the mahout script which I was able to use in linux and second using the yarn jar command where I passed in the mahout jar while using the windows version. In the case Mahout/ Pig (when using the provided sh script) will assume that the slaves already have the jars in place then the Ubuntu + Windows combo does not seem to work. Please advice.

    As I mentioned this is more as an experiment rather than an implementation plan. Our final env will be completely on linux. Thank you for your suggestions.

    解决方案

    I have only worked with the same username. In general SSH allows to login with a different login name with the -l command. But this might get tricky. You have to list your slaves in the slaves file.

    At least at the manual https://hadoop.apache.org/docs/r0.19.1/cluster_setup.html#Slaves I did not find anything to add usernames. it might be worth trying to add -l login_name to the slavenode in the slave conf file and see if it works.

    这篇关于Hadoop集群配置与Ubuntu主站和Windows从站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆