Hadoop 客户端节点配置 [英] Hadoop Client Node Configuration

查看:17
本文介绍了Hadoop 客户端节点配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个有 20 台机器的 Hadoop 集群.在这 20 台机器中,18 台机器是从属机器,19 台机器用于 NameNode,机器 20 用于 JobTracker.

Assume that there is a Hadoop Cluster that has 20 machines. Out of those 20 machines 18 machines are slaves and machine 19 is for NameNode and machine 20 is for JobTracker.

现在我知道所有这 20 台机器都必须安装 hadoop 软件.

Now i know that hadoop software has to be installed in all those 20 machines.

但我的问题是涉及哪台机器将文件 xyz.txt 加载到 Hadoop 集群中.那台客户端机器是一台单独的机器吗?我们是否也需要在该 clinet 机器上安装 Hadoop 软件.客户端机器如何识别Hadoop集群?

but my question is which machine is involved to load a file xyz.txt in to Hadoop Cluster. Is that client machine a separate machine . Do we need to install Hadoop software in that clinet machine as well. How does the client machine identifes Hadoop cluster?

推荐答案

我是 hadoop 的新手,所以根据我的理解:

I am new to hadoop, so from what I understood:

如果您的数据上传不是集群的实际服务,应该运行在集群的边缘节点上,那么您可以将自己的计算机配置为边缘节点.

If your data upload is not an actual service of the cluster, which should be running on an edge node of the cluster, then you can configure your own computer to work as an edge node.

边缘节点不需要为集群所知(但出于安全考虑),因为它不存储数据也不计算作业.这基本上就是边缘节点的含义:它连接到 hadoop 集群但不参与.

An edge node doesn't need to be known by the cluster (but for security stuff) as it does not store data nor compute job. This is basically what it means to be an edge-node: it is connected to the hadoop cluster but does not participate.

如果它可以帮助某人,以下是我为连接到我不管理的集群所做的工作:

In case it can help someone, here is what I have done to connect to a cluster that I don't administer:

  • 在集群上获得一个帐户,比如 myaccount
  • 在您的计算机上创建一个同名帐户:myaccount
  • 配置您的计算机以访问集群机器(ssh 不带密码、注册 ip 等)
  • 从集群的边缘节点获取 hadoop 配置文件
  • 获取 hadoop 发行版(例如,从 此处)
  • 将其解压缩到您想要的位置,例如 /home/myaccount/hadoop-x.x
  • 添加以下环境变量:JAVA_HOMEHADOOP_HOME(/home/me/hadoop-xx)
  • (如果您愿意)将 hadoop bin 添加到您的路径中:export PATH=$HADOOP_HOME/bin:$PATH
  • 用您从边缘节点获得的配置文件替换您的 hadoop 配置文件.对于 hadoop 2.5.2,它是文件夹 $HADOOP_HOME/etc/hadoop
  • 此外,我不得不更改在 conf 文件中定义的几个 $JAVA_HOME 的值.要找到它们,请使用:grep -r "export.*JAVA_HOME"
  • get an account on the cluster, say myaccount
  • create an account on you computer with the same name: myaccount
  • configure your computer to access the cluster machines (ssh wout passphrase, registered ip, ...)
  • get the hadoop configuration files from an edge-node of the cluster
  • get a hadoop distrib (eg. from here)
  • uncompress it where you want, say /home/myaccount/hadoop-x.x
  • add the following environment variables: JAVA_HOME, HADOOP_HOME (/home/me/hadoop-x.x)
  • (if you'd like) add hadoop bin to your path: export PATH=$HADOOP_HOME/bin:$PATH
  • replace your hadoop configuration files by those you got from the edge node. With hadoop 2.5.2, it is the folder $HADOOP_HOME/etc/hadoop
  • also, I had to change the value of a couple $JAVA_HOME defined in conf files. To find them use: grep -r "export.*JAVA_HOME"

然后执行 hadoop fs -ls/ 这应该列出集群 hdfs 的根目录.

Then do hadoop fs -ls / which should list the root directory of the cluster hdfs.

这篇关于Hadoop 客户端节点配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆