在 Windows 上运行 Apache Hadoop 2.1.0 [英] Running Apache Hadoop 2.1.0 on Windows

查看:24
本文介绍了在 Windows 上运行 Apache Hadoop 2.1.0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Hadoop 的新手,在尝试在我的 Windows 7 机器上运行它时遇到了问题.特别是我对运行 Hadoop 2.1.0 感兴趣,因为它的 发行说明 提到支持在 Windows 上运行.我知道我可以尝试使用 Cygwin 在 Windows 上运行 1.x 版本,甚至可以使用 Cloudera 等准备好的 VM,但由于某些原因,这些选项对我来说不太方便.

检查了来自 http://apache-mirror 的 tarball.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/ 我发现确实有一些 *.cmd 脚本可以在没有 Cygwin 的情况下运行.当我格式化 HDFS 分区时一切正常,但是当我尝试运行 hdfs namenode 守护程序时,我遇到了两个错误:首先,非致命错误,是找不到 winutils.exe(它确实不存在于下载的 tarball 中).我在 Apache Hadoop 源代码树中找到了该组件的源代码,并使用 Microsoft SDK 和 MSbuild 对其进行了编译.多亏了详细的错误消息,很清楚将可执行文件放在哪里才能满足 Hadoop.但是第二个致命错误没有包含足够的信息让我解决:

13/09/05 10:20:09 致命的 namenode.NameNode:namenode 加入异常java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z在 org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(本地方法)在 org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)在 org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)在 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)在 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)在 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)...13/09/05 10:20:09 INFO util.ExitUtil:以状态 1 退出

看起来应该编译其他东西.我将尝试使用 Maven 从源代码构建 Hadoop,但没有更简单的方法吗?是不是有一些我不知道的选项可以禁用本机代码并使该 tarball 在 Windows 上可用?

谢谢.

更新.确实是的.Homebrew"包包含一些额外的文件,最重要的是 winutils.exe 和 hadoop.dll.有了这个文件 namenode 和 datanode 成功启动.我认为这个问题可以结束.我没有删除,以防有人遇到同样的困难.

更新 2. 为了构建自制软件"包,我执行了以下操作:

  1. 获得源代码并解压.
  2. 仔细阅读 BUILDING.txt.
  3. 已安装的依赖项:
    3a) Windows SDK 7.1
    3b) Maven (我用的是 3.0.5)3c) JDK(我用的是 1.7.25)
    3d)ProtocolBuffer(我使用了 2.5.0 - http://protobuf.googlecode.com/files/protoc-2.5.0-win32.zip).只需将编译器 (protoc.exe) 放入某些 PATH 文件夹中即可.
    3e) 一套 UNIX 命令行工具(我安装了 Cygwin)
  4. 启动了 Windows SDK 的命令行.开始 |所有程序 |微软视窗 SDK v7.1 |... 命令提示符(我修改了这个快捷方式,在命令行中添加选项/release 以构建本机代码的发布版本).所有后续步骤均在 SDK 命令行窗口内完成)
  5. 设置环境:

    设置 JAVA_HOME={path_to_JDK_root}

似乎 JAVA_HOME 不得包含空格!

set PATH={path_to_maven_bin};%PATH%设置平台=x64设置 PATH={path_to_cygwin_bin};%PATH%设置 PATH={path_to_protoc.exe};%PATH%

  1. 将目录更改为源根文件夹(BUILDING.txt 警告路径长度有一些限制,因此源根应具有短名称 - 我使用了 D:hds)
  2. 运行构建过程:

    mvn 包 -Pdist -DskipTests

您可以尝试不使用skipTests",但在我的机器上,某些测试失败并且构建终止.它可能与 BUILDING .txt 中提到的符号链接问题有关.8.在hadoop-dist argethadoop-2.1.0-beta中选择结果(windows可执行文件和dll在'bin'文件夹中)

解决方案

我已按照以下步骤安装 Hadoop 2.2.0

为 Windows 构建 Hadoop bin 分发的步骤

  1. 下载并安装 Microsoft Windows SDK v7.1.

  2. 下载并安装 Unix 命令行工具 Cygwin.

  3. 下载并安装 Maven 3.1.1.

  4. 下载 Protocol Buffers 2.5.0 并解压到一个文件夹(比如 c:protobuf).

  5. 如果尚未添加,请添加环境变量 JAVA_HOME、M2_HOME 和平台.注意:变量名平台区分大小写.对于在 64 位或 32 位系统上构建,值将是 x64 或 Win32.编辑路径变量添加Cygwin的bin目录(如C:cygwin64in)、Maven的bin目录(如C:mavenin)和Protocol Buffers的安装路径(如c:protobuf).

  6. 下载 hadoop-2.2.0-src.tar.gz 并解压到具有短路径的文件夹(例如 c:hdfs),以避免由于 Windows 中的最大路径长度限制而导致运行时问题.

  7. 选择开始 --> 所有程序 --> Microsoft Windows SDK v7.1 并打开 Windows SDK 7.1 命令提示符.将目录更改为 Hadoop 源代码文件夹 (c:hdfs).使用选项 -Pdist,native-win -DskipTests -Dtar 执行 mvn package 以创建 Windows 二进制 tar 发行版.

  8. 如果上一步一切顺利,那么将在 C:hdfshadoop-dist argethadoop-2.2.0 目录中创建原生发行版 hadoop-2.2.0.tar.gz.

安装 Hadoop

  1. 将 hadoop-2.2.0.tar.gz 解压到一个文件夹(比如 c:hadoop).

  2. 添加环境变量 HADOOP_HOME 并编辑路径变量以添加 HADOOP_HOME 的 bin 目录(比如 C:hadoopin).

配置 Hadoop

C:hadoopetchadoopcore-site.xml

<预><代码><配置><财产><name>fs.defaultFS</name><value>hdfs://localhost:9000</value></属性></配置>

C:hadoopetchadoophdfs-site.xml

<预><代码><配置><财产><name>dfs.replication</name><值>1</值></属性><财产><name>dfs.namenode.name.dir</name><value>文件:/hadoop/data/dfs/namenode</value></属性><财产><name>dfs.datanode.data.dir</name><value>文件:/hadoop/data/dfs/datanode</value></属性></配置>

C:hadoopetchadoopmapred-site.xml

<预><代码><配置><财产><name>mapreduce.framework.name</name><value>纱线</value></属性></配置>

C:hadoopetchadoopyarn-site.xml

<预><代码><配置><财产><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></属性><财产><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></属性></配置>

格式化namenode

仅第一次,需要格式化namenode.

C:Usersabhijitg>cd c:hadoopinc:hadoopin>hdfs namenode –format

启动 HDFS(Namenode 和 Datanode)

C:Usersabhijitg>cd c:hadoopsbinc:hadoopsbin>start-dfs

启动 MapReduce 又名 YARN(资源管理器和节点管理器)

C:Usersabhijitg>cd c:hadoopsbinc:hadoopsbin>start-yarn启动纱线守护进程

将自动打开总共四个单独的命令提示符窗口来运行Namenode、Datanode、Resource Manager、Node Manager

参考:在 Microsoft Windows 操作系统中构建、安装、配置和运行 Apache Hadoop 2.2.0

I am new to Hadoop and have run into problems trying to run it on my Windows 7 machine. Particularly I am interested in running Hadoop 2.1.0 as its release notes mention that running on Windows is supported. I know that I can try to run 1.x versions on Windows with Cygwin or even use prepared VM by for example Cloudera, but these options are in some reasons less convenient for me.

Having examined a tarball from http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/ I found that there really are some *.cmd scripts that can be run without Cygwin. Everything worked fine when I formated HDFS partition but when I tried to run hdfs namenode daemon I faced two errors: first, non fatal, was that winutils.exe could not be found (it really wasn't present in the tarball downloaded). I found the sources of this component in the Apache Hadoop sources tree and compiled it with Microsoft SDK and MSbuild. Thanks to detailed error message it was clear where to put the executable to satisfy Hadoop. But the second error which is fatal doesn't contain enough information for me to solve:

13/09/05 10:20:09 FATAL namenode.NameNode: Exception in namenode join
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
    at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)
    at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)
...
13/09/05 10:20:09 INFO util.ExitUtil: Exiting with status 1

Looks like something else should be compiled. I'm going to try to build Hadoop from the source with Maven but isn't there a simpler way? Isn't there some option-I-know-not-of that can disable native code and make that tarball usable on Windows?

Thank you.

UPDATED. Yes, indeed. "Homebrew" package contained some extra files, most importantly winutils.exe and hadoop.dll. With this files namenode and datanode started successfully. I think the question can be closed. I didn't delete it in case someone face the same difficulty.

UPDATED 2. To build the "homebrew" package I did the following:

  1. Got sources, and unpacked them.
  2. Read carefully BUILDING.txt.
  3. Installed dependencies:
    3a) Windows SDK 7.1
    3b) Maven (I used 3.0.5) 3c) JDK (I used 1.7.25)
    3d) ProtocolBuffer (I used 2.5.0 - http://protobuf.googlecode.com/files/protoc-2.5.0-win32.zip). It is enough just to put compiler (protoc.exe) into some of the PATH folders.
    3e) A set of UNIX command line tools (I installed Cygwin)
  4. Started command line of Windows SDK. Start | All programs | Microsoft Windows SDK v7.1 | ... Command Prompt (I modified this shortcut, adding option /release in the command line to build release versions of native code). All the next steps are made from inside SDK command line window)
  5. Set up the environment:

    set JAVA_HOME={path_to_JDK_root}

It seems that JAVA_HOME MUST NOT contain space!

set PATH={path_to_maven_bin};%PATH%  
set Platform=x64  
set PATH={path_to_cygwin_bin};%PATH%  
set PATH={path_to_protoc.exe};%PATH%  

  1. Changed dir to sources root folder (BUILDING.txt warns that there are some limitations on the path length so sources root should have short name - I used D:hds)
  2. Ran building process:

    mvn package -Pdist -DskipTests

You can try without 'skipTests' but on my machine some tests failed and building was terminated. It may be connected to sybolic link issues mentioned in BUILDING .txt. 8. Picked the result in hadoop-dist argethadoop-2.1.0-beta (windows executables and dlls are in 'bin' folder)

解决方案

I have followed following steps to install Hadoop 2.2.0

Steps to build Hadoop bin distribution for Windows

  1. Download and install Microsoft Windows SDK v7.1.

  2. Download and install Unix command-line tool Cygwin.

  3. Download and install Maven 3.1.1.

  4. Download Protocol Buffers 2.5.0 and extract to a folder (say c:protobuf).

  5. Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already. Note : Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit or 32-bit system. Edit Path Variable to add bin directory of Cygwin (say C:cygwin64in), bin directory of Maven (say C:mavenin) and installation path of Protocol Buffers (say c:protobuf).

  6. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:hdfs) to avoid runtime problem due to maximum path length limitation in Windows.

  7. Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (c:hdfs). Execute mvn package with options -Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

  8. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:hdfshadoop-dist argethadoop-2.2.0 directory.

Install Hadoop

  1. Extract hadoop-2.2.0.tar.gz to a folder (say c:hadoop).

  2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME (say C:hadoopin).

Configure Hadoop

C:hadoopetchadoopcore-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
</configuration>

C:hadoopetchadoophdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/hadoop/data/dfs/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/hadoop/data/dfs/datanode</value>
        </property>
</configuration>

C:hadoopetchadoopmapred-site.xml

<configuration>
        <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
        </property>
</configuration>

C:hadoopetchadoop yarn-site.xml

<configuration>
        <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
        </property>
        <property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

Format namenode

For the first time only, namenode needs to be formatted.

C:Usersabhijitg>cd c:hadoopin 
c:hadoopin>hdfs namenode –format

Start HDFS (Namenode and Datanode)

C:Usersabhijitg>cd c:hadoopsbin
c:hadoopsbin>start-dfs

Start MapReduce aka YARN (Resource Manager and Node Manager)

C:Usersabhijitg>cd c:hadoopsbin
c:hadoopsbin>start-yarn
starting yarn daemons

Total four separate Command Prompt windows will be opened automatically to run Namenode, Datanode, Resource Manager, Node Manager

Reference : Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

这篇关于在 Windows 上运行 Apache Hadoop 2.1.0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆