在Windows上运行Apache Hadoop 2.1.0 [英] Running Apache Hadoop 2.1.0 on Windows

查看:72
本文介绍了在Windows上运行Apache Hadoop 2.1.0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Hadoop的新手,并且遇到了试图在Windows 7机器上运行它的问题。特别是我对运行Hadoop 2.1.0感兴趣,因为它的发布说明指出,支持在Windows上运行。我知道我可以尝试使用Cygwin在Windows上运行1.x版本,甚至可以使用例如Cloudera准备的VM,但这些选项对我来说不太方便。



查看了 http:// apache-mirror .rbc.ru / pub / apache / hadoop / common / hadoop-2.1.0-beta / 我发现确实有一些* .cmd脚本可以在没有Cygwin的情况下运行。当我格式化HDFS分区时,一切正常,但是当我试图运行hdfs namenode守护进程时,我遇到了两个错误:第一,非致命,是找不到winutils.exe(它确实没有出现在下载的tarball中)。我在Apache Hadoop源代码树中找到了此组件的源代码,并使用Microsoft SDK和MSbuild编译了它。由于有详细的错误信息,因此很清楚在哪里放置可执行文件以满足Hadoop。但是第二个致命的错误并没有包含足够的信息供我解决:

  13/09/05 10:20 :09 FATAL namenode.NameNode:namenode中的异常连接
java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO $ Windows.access0(Ljava / lang / String; I)Z
at org.apache.hadoop.io.nativeio.NativeIO $ Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO $ Windows.access(NativeIO.java:423)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)
at org.apache.hadoop.hdfs.server.common.Storage $ StorageDirectory.analyzeStorage(Storage.java:451)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java :200)
...
13/09/05 10:20:09信息util.ExitUtil:以状态1退出

看起来应该编译一些其他东西。我将尝试使用Maven从源代码构建Hadoop,但没有更简单的方法吗?是不是有一些可以禁用本地代码的选项 - 我不知道 - 可以禁用本地代码,并在Windows上使用该压缩包?



谢谢。



已更新。确实是的。 Homebrew软件包包含一些额外的文件,最重要的是winutils.exe和hadoop.dll。这个文件的namenode和datanode启动成功。我认为这个问题可以结束。我没有删除它,以防某人遇到同样的困难。



已更新2.要构建自制软件包,我做了以下操作:


  1. 获取源代码并将其解压缩。

  2. 仔细阅读BUILDING.txt。

  3. 安装的依赖项:

    3a)Windows SDK 7.1

    3b)Maven(我用3.0.5)
    3c)JDK(我用1.7.25)

    3d)ProtocolBuffer(我用2.5.0 - http://protobuf.googlecode。 COM /文件/ protoc-2.5.0-win32.zip )。只需将编译器(protoc.exe)放入一些PATH文件夹即可。

    3e)一组UNIX命令行工具(我安装了Cygwin)
  4. 开始使用Windows SDK的命令行。开始|所有程序| Microsoft Windows SDK v7.1 | ...命令提示符(我修改了这个快捷方式,在命令行中添加了选项/释放以构建本地代码的发行版本)。接下来的所有步骤都是在SDK命令行窗口中进行的)

  5. 设置环境:

    set JAVA_HOME = {path_to_JDK_root}


似乎JAVA_HOME 不能包含空格!

  set PATH = {path_to_maven_bin};%PATH%
set Platform = x64
set PATH = { path_to_cygwin_bin};%PATH%
set PATH = {path_to_protoc.exe};%PATH%




  1. 将dir更改为源根文件夹(BUILDING.txt警告说,对路径长度有一些限制,所以sources root应该有短名称 - 我使用D:\ hds)

  2. Ran构建过程:

    mvn包-Pdist -DskipTests


您可以尝试不使用'skipTests',但是在我的机器上,有些测试失败并且建筑被终止。它可能与BUILDING .txt中提到的sybolic链接问题有关。
8.在hadoop-dist\target\hadoop-2.1.0-beta(windows可执行文件和dll在'bin'文件夹中)中选取结果

解决方案

我已按照以下步骤安装Hadoop 2.2.0



为Windows构建Hadoop bin分发版的步骤


  1. 下载并安装Microsoft Windows SDK v7.1。

  2. 下载并安装Unix命令行工具Cygwin。
  3. 下载并安装Maven 3.1.1。


  4. 下载Protocol Buffers 2.5.0并解压缩到一个文件夹(比如c:\protobuf)。

  5. 添加环境变量JAVA_HOME,M2_HOME和Platform if尚未添加。
    注意:变量名称平台区分大小写。在64位或32位系统上构建的值将是x64或Win32。
    编辑路径变量以添加Cygwin的bin目录(比如C:\cygwin64\bin),Maven的bin目录(比如C:\ maven\bin)以及Protocol Buffers的安装路径(比如c: \protobuf)。


  6. 下载hadoop-2.2.0-src.tar.gz并解压到一个短路径的文件夹(比如c:\ hdfs )以避免由于Windows中的最大路径长度限制导致的运行时问题。

  7. 选择开始 - >所有程序 - > Microsoft Windows SDK v7.1并打开Windows SDK 7.1命令提示符。将目录更改为Hadoop源代码文件夹(c:\ hdfs)。使用选项-Pdist,native-win -DskipTests -Dtar执行mvn软件包以创建Windows二进制tar分发。如果在上一步中一切顺利,那么将在C:\ hdfs \中创建本地分发hadoop-2.2.0.tar.gz。 hadoop-dist \target\hadoop-2.2.0目录。安装Hadoop 安装Hadoop >


    1. 将hadoop-2.2.0.tar.gz解压到一个文件夹(比如c:\ hadoop)添加环境变量HADOOP_HOME并编辑路径变量以添加HADOOP_HOME的bin目录(比如C:\ hadoop \ bin)。


    配置Hadoop



    C:\ hadoop\etc\hadoop\core-site.xml

     < configuration> 
    <属性>
    <名称> fs.defaultFS< / name>
    < value> hdfs:// localhost:9000< / value>
    < / property>
    < / configuration>

    C:\hadoop\etc\hadoop\hdfs-site.xml

     < configuration> 
    <属性>
    < name> dfs.replication< / name>
    <值> 1< /值>
    < / property>
    <属性>
    <名称> dfs.namenode.name.dir< /名称>
    <值>文件:/ hadoop / data / dfs / namenode< / value>
    < / property>
    <属性>
    < name> dfs.datanode.data.dir< / name>
    <值>文件:/ hadoop / data / dfs / datanode< / value>
    < / property>
    < / configuration>

    C:\hadoop\etc\hadoop\mapred-site.xml

     < configuration> 
    <属性>
    < name> mapreduce.framework.name< / name>
    <值>纱线< /值>
    < / property>
    < / configuration>

    C:\hadoop\etc\hadoop\ yarn-site.xml

     < configuration> 
    <属性>
    < name> yarn.nodemanager.aux-services< / name>
    < value> mapreduce_shuffle< /值>
    < / property>
    <属性>
    < name> yarn.nodemanager.aux-services.mapreduce.shuffle.class< / name>
    < value> org.apache.hadoop.mapred.ShuffleHandler< / value>
    < / property>
    < / configuration>

    格式名称节点

      C:\ Users \abhijitg> cd c仅限第一次,namenode需要格式化。 :\hadoop\bin 
    c:\hadoop\bin> hdfs namenode -format

    启动HDFS(Namenode和Datanode)

      C:\ Users \abhijitg> cd c:\hadoop\sbin 
    c:\hadoop\sbin>启动dfs

    启动MapReduce aka YARN(资源管理器和节点管理器)

      C:\Users\\ \\ abhijitg> cd c:\hadoop\sbin 
    c:\hadoop\sbin>启动纱线
    启动纱线守护进程

    共有四个单独的命令提示符窗口将自动打开以运行 Namenode,Datanode,资源管理器,节点管理器
    $ b

    参考:构建,安装,配置和运行Apache Hadoop 2.2.0在Microsoft Windows操作系统中


    I am new to Hadoop and have run into problems trying to run it on my Windows 7 machine. Particularly I am interested in running Hadoop 2.1.0 as its release notes mention that running on Windows is supported. I know that I can try to run 1.x versions on Windows with Cygwin or even use prepared VM by for example Cloudera, but these options are in some reasons less convenient for me.

    Having examined a tarball from http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/ I found that there really are some *.cmd scripts that can be run without Cygwin. Everything worked fine when I formated HDFS partition but when I tried to run hdfs namenode daemon I faced two errors: first, non fatal, was that winutils.exe could not be found (it really wasn't present in the tarball downloaded). I found the sources of this component in the Apache Hadoop sources tree and compiled it with Microsoft SDK and MSbuild. Thanks to detailed error message it was clear where to put the executable to satisfy Hadoop. But the second error which is fatal doesn't contain enough information for me to solve:

    13/09/05 10:20:09 FATAL namenode.NameNode: Exception in namenode join
    java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
        at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
        at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)
        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)
    ...
    13/09/05 10:20:09 INFO util.ExitUtil: Exiting with status 1
    

    Looks like something else should be compiled. I'm going to try to build Hadoop from the source with Maven but isn't there a simpler way? Isn't there some option-I-know-not-of that can disable native code and make that tarball usable on Windows?

    Thank you.

    UPDATED. Yes, indeed. "Homebrew" package contained some extra files, most importantly winutils.exe and hadoop.dll. With this files namenode and datanode started successfully. I think the question can be closed. I didn't delete it in case someone face the same difficulty.

    UPDATED 2. To build the "homebrew" package I did the following:

    1. Got sources, and unpacked them.
    2. Read carefully BUILDING.txt.
    3. Installed dependencies:
      3a) Windows SDK 7.1
      3b) Maven (I used 3.0.5) 3c) JDK (I used 1.7.25)
      3d) ProtocolBuffer (I used 2.5.0 - http://protobuf.googlecode.com/files/protoc-2.5.0-win32.zip). It is enough just to put compiler (protoc.exe) into some of the PATH folders.
      3e) A set of UNIX command line tools (I installed Cygwin)
    4. Started command line of Windows SDK. Start | All programs | Microsoft Windows SDK v7.1 | ... Command Prompt (I modified this shortcut, adding option /release in the command line to build release versions of native code). All the next steps are made from inside SDK command line window)
    5. Set up the environment:

      set JAVA_HOME={path_to_JDK_root}

    It seems that JAVA_HOME MUST NOT contain space!

    set PATH={path_to_maven_bin};%PATH%  
    set Platform=x64  
    set PATH={path_to_cygwin_bin};%PATH%  
    set PATH={path_to_protoc.exe};%PATH%  
    

    1. Changed dir to sources root folder (BUILDING.txt warns that there are some limitations on the path length so sources root should have short name - I used D:\hds)
    2. Ran building process:

      mvn package -Pdist -DskipTests

    You can try without 'skipTests' but on my machine some tests failed and building was terminated. It may be connected to sybolic link issues mentioned in BUILDING .txt. 8. Picked the result in hadoop-dist\target\hadoop-2.1.0-beta (windows executables and dlls are in 'bin' folder)

    解决方案

    I have followed following steps to install Hadoop 2.2.0

    Steps to build Hadoop bin distribution for Windows

    1. Download and install Microsoft Windows SDK v7.1.

    2. Download and install Unix command-line tool Cygwin.

    3. Download and install Maven 3.1.1.

    4. Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf).

    5. Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already. Note : Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit or 32-bit system. Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\maven\bin) and installation path of Protocol Buffers (say c:\protobuf).

    6. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:\hdfs) to avoid runtime problem due to maximum path length limitation in Windows.

    7. Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (c:\hdfs). Execute mvn package with options -Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

    8. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0 directory.

    Install Hadoop

    1. Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop).

    2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME (say C:\hadoop\bin).

    Configure Hadoop

    C:\hadoop\etc\hadoop\core-site.xml

    <configuration>
            <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://localhost:9000</value>
            </property>
    </configuration>
    

    C:\hadoop\etc\hadoop\hdfs-site.xml

    <configuration>
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
            </property>
            <property>
                    <name>dfs.namenode.name.dir</name>
                    <value>file:/hadoop/data/dfs/namenode</value>
            </property>
            <property>
                    <name>dfs.datanode.data.dir</name>
                    <value>file:/hadoop/data/dfs/datanode</value>
            </property>
    </configuration>
    

    C:\hadoop\etc\hadoop\mapred-site.xml

    <configuration>
            <property>
               <name>mapreduce.framework.name</name>
               <value>yarn</value>
            </property>
    </configuration>
    

    C:\hadoop\etc\hadoop\ yarn-site.xml

    <configuration>
            <property>
               <name>yarn.nodemanager.aux-services</name>
               <value>mapreduce_shuffle</value>
            </property>
            <property>
               <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
               <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
    </configuration>
    

    Format namenode

    For the first time only, namenode needs to be formatted.

    C:\Users\abhijitg>cd c:\hadoop\bin 
    c:\hadoop\bin>hdfs namenode –format
    

    Start HDFS (Namenode and Datanode)

    C:\Users\abhijitg>cd c:\hadoop\sbin
    c:\hadoop\sbin>start-dfs
    

    Start MapReduce aka YARN (Resource Manager and Node Manager)

    C:\Users\abhijitg>cd c:\hadoop\sbin
    c:\hadoop\sbin>start-yarn
    starting yarn daemons
    

    Total four separate Command Prompt windows will be opened automatically to run Namenode, Datanode, Resource Manager, Node Manager

    Reference : Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

    这篇关于在Windows上运行Apache Hadoop 2.1.0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆