如何从源没有错误构建Hadoop [英] How Do I Build Hadoop From Source Without Errors

查看:327
本文介绍了如何从源没有错误构建Hadoop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我花了几个星期尝试解决构建Hadoop的不同错误。 SO有助于指向我偶尔问题的答案,但经过大量的搜索在这里,我从来没有能够得到整个建设。



这几个星期以来,我已经忘记了大部分的明确错误讯息,但我遇到的问题




  • Protobuff版本错误

  • SSH连接无效

  • 构建过程中的Mojofailure异常

  • 使用的Java版本不正确

  • C ++整理检查失败

  • 我和我无法破解
  • 的根本原因。


今天我终于得到了Hadoop从git repo源,并希望记录



对于那些试图从源代码构建Hadoop的人来说,这里是我从源代码编译的一切。 p>

有关配置的一些说明:




  • 我在虚拟环境中安装Hadoop,我的case VirtualBox。

  • 主机运行Windows 7 x64

  • 访客VM运行CentOS 7 x64

  • 如何从源代码构建Hadoop而不出错 / strong>



    预备下载:
    您需要先开始下载。




    • Virtual Box(我使用的是版本4.3.16 r95972,位于:



      忽略页面顶部的公共更新结束错误消息。 Java 7是Apache推荐的。


      您要下载jdk-7u79-linux-x64.rpm文件


      一旦下载,使用WinSCP导航到主机计算机的下载目录和新创建的客户虚拟机的下载目录(您可能需要单击刷新图标在WinSCP窗格的VM一侧查看该目录)。将jdk文件从主机拖放到VM来宾。





      现在我们只需要在CentOS VM上安装JDK。从CentOS命令行将目录更改为我们在根目录下创建的downloads文件夹,一旦在downloads目录中使用rpm安装java 7。



      code> cd〜/ downloads
      rpm -ihv jdk-7u79-linux-x64.rpm



      安装完成后,您可以输入



      java -version



      这将产生表明您安装了Java运行时环境的输出。



      要安装Hadoop需要成功构建的软件包的子集。该列表直接从Apache网站获取:



      并向下滚动,直到我们看到Hadoop所需的版本编译。对于此演练,您要下载以下文件。


      protobuf-2.5.0.tar.gz


      下载后,使用WinSCP并将其传送到VM的downloads文件夹,就像您之前为其他下载所做的那样。一旦文件位于VM的downloads文件夹中,请发出以下命令在CentOS上安装ProtocolBuffer



      cd〜/ downloads



      tar xzf protobuf-2.5.0.tar.gz -C / usr / local



      cd /usr/local/protobuf-2.5.0



      ./ configure



      `make'



      <完成后,将安装构建Hadoop所需的所有必备实用程序和依赖关系。p> make install



      完成第3阶段



      第4阶段 - 构建Hadoop错误



      转到Hadoop目录,并使用以下命令运行Maven跳过测试:



      cd / usr / local / hadoop



      mvn clean install -DskipTests



      现在,构建应该没有任何问题,一旦完成,您应该看到一个类似下面的屏幕。





      完成通行



      我希望有些人觉得有用。


      I have spent weeks trying to resolve different errors in building Hadoop. SO was helpful in pointing me towards the answer to an occasional problem, but after a lot of searching here on SO, I was never able to get the whole thing to build.

      It’s been a couple of weeks since all this started so I have forgotten most of the explicit error messages, but the problems I had included

      • Protobuff versions being wrong
      • SSH connections not working
      • Mojofailure Exceptions during build
      • Incorrect Java versions being used
      • C++ sanity checks failing
      • a host of other crap that made no sense to me and I couldn't decipher root causes for

      Today I finally got Hadoop to build from the git repo source and wanted to record the process for the SO community members that face similar problems.

      For those of you trying to build Hadoop from source, here is how I got everything to compile from source.

      Some notes on configuration:

      • I am installing Hadoop in a virtual environment, in my case VirtualBox.
      • The Host machine runs Windows 7 x64
      • The Guest VM runs CentOS 7 x64
      • I am aiming for the bare minimum installation

      解决方案

      How to Build Hadoop From Source Without Errors

      Preliminary Downloads: You need to download the following before you begin.

      This walk through consists of 4 Phases

      1. Create a CentOS Appliance inside VirtualBox that can support building Hadoop
      2. Add SSH capabilities to the Appliance so that downloaded prerequisites can be scp’ed from the Host to the Guest VM
      3. Install all the things (utilities and dependencies) needed to build Hadoop
      4. Build Hadoop without errors

      Phase 1 - Creating a CentOS Appliance for VirtualBox

      Start by opening VirtualBox and clicking on the "New" button in the top left corner. This will open a new window asking for some information about the virtual machine appliance you want to create.

      • Name it "CentOS x64 – Hadoop Base"
      • Select Linux as the "Type" of operating system
      • Select RedHat (64 Bit) as the "Version."
      • Click "Next"

      Follow the remaining prompts in the VM creation wizard. The only things I changed from the defaults where on the "Memory size" passage (I used 4096 MB) and the "File location and size" passage (I used 128 GB). I would encourage you to do the same if your system can support it. Leave all other defaults alone

      • Click "Create" on the last passage of the VM creation wizard

      Once created, the VM will show up on the left hand pane of the VirtualBox Window.

      • Double click on the VM you just created and wait for the dialog to come up asking you for the iso file you want to use.
      • When the dialog appears, click on the folder icon on the right and navigate to / select the "CentOS minimal iso" you downloaded during the Preliminary steps.
      • Once the iso is listed in the drop down box Click "Start"

      When prompted, after the VM boots, select "Install CentOS 7" (this is not the default, you have to press the "up" arrow) and press "Enter". When the setup program loads, the first thing it will ask you about is your keyboard layout. I leave the defaults in place and just click the "Continue" button in the lower right corner. This brings up the Installation Summary page on which you need to make changes to 2 areas: "Installation Destination" and "Network & Host Name"

      • Click "Installation Destination"
      • Double Click the virtual disk (make sure that the background is blue and the check mark is there)
      • Click "Done" to go back to the "Installation Summary" page.

      Back on the Installation Summary page, - Click "Network and Host Name" - In this menu screen turn on Ethernet networking by clicking the toggle switch on the right. - Click "Done" in the top left corner.

      With both modifications complete you can click the "Begin Installation" button in the bottom right corner. As the iso installs to your system you should take the time to provide a root password by

      • Clicking on that option at the top left of the page
      • Filling out the form it brings up
      • Clicking "Done" (if you select a password considered weak, you have to double click "Done" to accept anyway).

      I added a password, but I did not bother to add any non-root users.

      Once everything is installed click on the "Reboot" button that appears in the bottom right of the screen.

      Once the system reboots select CentOS 7 and allow it to boot. Check your credentials by logging in as root, and then close the CentOS VM by clicking on the red X button at the top right of the window and selecting "Power off the machine" when prompted.

      This completes Phase 1

      You should now be looking at just VirtualBox

      Phase 2 - Adding SSH capabilities to the VM to support download transfers

      • Open the settings of your CentOS Appliance by first clicking the appliance
      • Next, click the "Settings" button on the top left of VirtualBox’s main menu. This will bring up a new window.
      • In the left hand pane of the new window, click on "Network" which will display a set of adapter tabs.
      • Now click on the Triangle to the left of the label "Advanced".
      • This will reveal a series of options, but the one you need to click on is the button labeled "Port Forwarding"

      This will bring up another window where you can set port forwarding rules.

      • Click the green plus sign in the top right corner. This will produce a row where you can enter in a port forwarding rule.
      • Add the following rule to the row

      Name= ssh, Host port =2222, Guest port = 22

      • Click the "OK" button on the Port Forwarding window
      • Click the "OK" button on the Appliance Settings window.

      With this rule in place you should now be able to ssh from your Windows Host to the CentOS Guest on port 2222 and avoid the following error:

      ssh: connect to host localhost port 22: Connection refused

      You should now be looking at just VirtualBox again.

      • Start the CentOS VM appliance and log in as root.
      • Once logged in, execute the following line from the command prompt.

      yum –y install openssh-server openssh-client

      This command will install a ssh server on the CentOS VM. After the install, confirm that the ssh server is running by typing the following command.

      ps –aux | grep sshd

      This command should return 2 processes showing sshd (the ssh daemon). One is the grep command itself. The other is your server running in the background.

      Now we need to make sure that ssh did in fact generate the keys it will need to communicate with WinSCP. Issue the following command and make sure that all keys’ byte size values are non-zero.

      ls -l /etc/ssh

      If the sizes of the keys are 0 bytes, you need to remove them, restart the sshd daemon, and validate that the keys were regenerated when sshd restarted. To do all that, execute the following commands

      rm –rf /etc/ssh/ssh*key* systemctl restart sshd ls -l /etc/ssh

      This processes will help avoid unexpected "connection closed by 127.0.0.1" errors.

      Now that we have an ssh daemon up and keys generated, we are going to test the connection. Start by opening WinSCP. And entering in the following values on the start menu that pops up.

      Host name = localhost, Port number = 2222, User name = root, Password = , File Protocol = SCP.

      Note that you need to set "File Protocol" last. If you don’t, it will try to outsmart you when you enter in a "Port number" that it isn’t expecting. When all the values are entered. Click the "Login" Button and accept / click Update or OK to any security warnings you get.

      Once you have logged in, move a file between the Host and VM Guest to confirm everything is working.

      Though I won’t focus on it here, you can also us Cygwin to connect to the VM, and it is useful for diagnosing connection problems. The command you need to enter to get verbose diagnostic output is

      ssh –vvv –p 2222 root@localhost

      This completes Phase 2

      Phase 3 - Install Utilities and Dependencies Needed to Build Hadoop

      Our CentOS distribution really is "barebones" and so we need to install everything required to build Hadoop. We will do this by downloading most things in Windows and then moving them over to the VM via WinSCP.

      Before we start, we need to add a "downloads" directory to the home directory of the root user on the CentOS VM by issuing the following command at the CentOS command line.

      mkdir ~/downloads/

      We can now begin downloading Hadoop dependencies. We will download everything to Windows and then use WinSCP to move it over to the VM.

      Start by downloading the Java 7 JDK from - http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

      Ignore the "End of Public Updates" error message at the top of the page. Java 7 is what Apache recommends.

      You want to download the jdk-7u79-linux-x64.rpm file

      Once downloaded use WinSCP to navigate to the Downloads directory of the Host computer and the newly created "downloads" directory of the Guest VM (you may need to click the refresh icon on the VM side of the WinSCP pane to see the directory). Drag and drop the jdk file from the Host over to the VM Guest.

      Now we just need to install the JDK on the CentOS VM. From the CentOS command line change your directory to the "downloads" folder we created under root’s home, once in the "downloads" directory use rpm to install java 7.

      cd ~/downloads rpm –ihv jdk-7u79-linux-x64.rpm

      Once installation is complete, you can verify it by typing

      java –version

      Which will produce output stating that you have a Java Run Time Environment installed.

      Next we are going to install a subset of the packages Hadoop needs to build successfully. The list is taken straight from the Apache website: https://wiki.apache.org/hadoop/HowToContribute and the command we need to enter on the command line to retrieve them is:

      yum -y install lzo-devel zlib-devel gcc autoconf automake libtool openssl-devel fuse-devel

      Next we are going to install Apache’s Maven. You can download it here:
      https://archive.apache.org/dist/maven/binaries/

      Apache’s website says you can use version 3+. I used version 3.2.2 so download this file to follow along:

      apache-maven-3.2.2-bin.tar.gz

      Once you have the file downloaded, use WinSCP to move it from your host computer to the Guest VM ‘s "downloads" folder just like you did with the JDK file. We then untar the file into the /usr/local/ directory, and create a symbolic link in the /usr/local/ directory that points to the maven folder with the following three commands.

      tar xzf apache-maven-3.2.2-bin.tar.gz -C /usr/local

      cd /usr/local

      ln -s apache-maven-3.2.2 maven

      We now need to add Maven’s bin directory to the $PATH variable. We do so by editing the .bashrc file in root’s home directory. Open the file for editing in vi by using the following command

      vi ~/.bashrc

      This will bring up the bash file in the vi editor ( if you need it, a tutorial on vi can be found here: http://www.unix-manuals.com/tutorials/vi/vi-in-10-1.html ) follow these instructions to correctly update the file.

      • Enter Edit mode by pressing the "a" key
      • Add the following lines to the file:
        • export M2_HOME=/usr/local/maven
        • export PATH=$M2_HOME/bin:$PATH
      • Press the "Esc" key to leave Edit mode
      • Type ":wq" – it will automatically show up at the vi command line (bottom left of the screen)
      • Press "Enter"

      Now log out of CentOS. Log back into CentOS, and check to make sure that the new PATH variable is appropriately set using the following commands.

      exit

      <log back in as root>

      mvn –version

      you should see output indicating that maven is currently installed

      Next we need to install C++ support for gcc. We do that with the following one line command

      yum –y install gcc-c++.x86_64

      Next we need to install git so that we can pull down the Hadoop source code.

      yum –y install git

      Once you have git. Go ahead and pull down the Hadoop source. There is still one more thing (ProtocolBuffer) we need before we can build the source code, but we need to see the BUILDING.txt file in the repo before we download ProtocolBuffer to make sure that we get the right version.

      To get the Hadoop source we run the git clone command. Simply execute the following commands from the CentOS command line to download the Hadoop repo.

      cd /usr/local

      git clone git://git.apache.org/hadoop.git

      The clone operation will place a "hadoop" directory in your /usr/local directory. When the operation has completed and you have the command prompt back, take a look at the BUILDING.txt file in your new hadoop directory using the following command:

      less /usr/local/hadoop/BUILDING.txt

      In the "Requirements" section of the file it states the version of ProtocolBuffer we need for Hadoop to build correctly. In this case it’s ProtocolBuffer 2.5.0. With this information in hand we go back to the command prompt by pressing "q" for quit.

      Now we can finally, install the last of the things Hadoop needs: ProtocolBuffer. To get the right version of ProtocolBuffer, we visit the ProtocolBuffer release page:

      https://github.com/google/protobuf/releases

      and scroll down until we see the version needed for Hadoop to compile. For this walkthrough you want to download the following file.

      protobuf-2.5.0.tar.gz

      Once downloaded, use WinSCP and transfer it to the VM’s "downloads" folder like you did earlier for the other downloads. Once the file is sitting in the VM’s "downloads" folder, issue the following commands to install ProtocolBuffer on CentOS

      cd ~/downloads

      tar xzf protobuf-2.5.0.tar.gz -C /usr/local

      cd /usr/local/protobuf-2.5.0

      ./configure

      `make'

      make install

      Once this is done all the prerequisite utilities and dependencies needed for building Hadoop will be installed.

      This completes Phase 3

      Phase 4 - Build Hadoop Without Errors

      Go to the Hadoop directory, and run Maven skipping the tests using the following commands:

      cd /usr/local/hadoop

      mvn clean install -DskipTests

      The build should now occur without any problems and when everything is finished, you should see a screen like the one below.

      This completes the walk through

      I hope some of you find it helpful.

      这篇关于如何从源没有错误构建Hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆