类com.hadoop.com pression.lzo.Lzo codeC找不到星火上CDH 5? [英] Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?

查看:257
本文介绍了类com.hadoop.com pression.lzo.Lzo codeC找不到星火上CDH 5?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直对这个问题了两天,仍然没有找到正确的路。

问题:通过最新的CDH 5总是抱怨失去的LZO codec级别的,即使我通过了Cloudera经理地块安装HADOOP_LZO安装了我们的星火。 我们都在CDH 5.0.0-1.cdh5.0.0.p0.47运行MR1

尝试修复
在<一个配置href=\"https://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.7.3/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Com$p$pssion.html\">official CDH文档中关于使用LZO包裹也将被添加,但问题依然存在。

大部分的Google搜索职位给出类似的建议以上。我还怀疑火花试图对纱线未激活那里运行;但我无法找到有关此主题的CMF或其他职位的配置。

请给我一些帮助,如果你知道如何对付它。


解决方案

解决!五月的解决方案的帮助别人谁遇到同样的问题。


在本教程中,我将向你展示如何启用Hadoop上,猪八戒LZO COM pression
和Spark。我想你已经建立了一个基本的Hadoop安装
成功(如果没有,请参考其他教程 Hadoop的安装
)。

您,因为你遇到同样的问题可能到达此页面
我遇到过,一般先从Java异常:

 产生的原因:抛出java.lang.ClassNotFoundException:类com.hadoop.com pression.lzo.Lzo codeC没有找到。

作为Apache和Cloudera的分布是两个最流行
分布中,显示了两种情况下配置。简单地说,三
主要步骤将通过将地走向最后的成功:


  • 安装本机LZO

  • 安装 Hadoop的LZO

  • 正确设置环境变量(右部分
    耗时我最时间)

第一步:安装本机LZO

借助本机库LZO是
安装所需的 Hadoop的LZO 。您可以手动安装
或者通过促进软件包管理器(注意:确保在所有节点
集群已经安装了本机LZO ):


  • 在Mac OS:

      sudo的港口安装LZOP lzo2


  • 在RH或CentOS的:

     须藤yum的安装LZO liblzo-DEVEL


  • 在Debian或Ubuntu:

     命令和apt-get安装liblzo2-dev的


第二步:安装 Hadoop的LZO

对于Apache Hadoop的

由于LZO是GPL的,它不随官方的Hadoop分布,
采用Apache软件许可证。我建议微博版本这是分支版本
<一href=\"https://$c$c.google.com/a/apache-extras.org/p/hadoop-gpl-com$p$pssion>\">hadoop-gpl-com$p$pssion同
显着的改进。如果您运行的是正式的Hadoop,一些
提供的文档。

有关Cloudera的分销

在Cloudera的CDH, Hadoop的LZO 是运到客户的包裹和你
可以下载并conviniently使用Cloudera的经理分发。通过
默认情况下, Hadoop的LZO 将被安装在
的/ opt / Cloudera的/包裹/ HADOOP_LZO

在这里,我们显示我们的群集上的配置:


  • Cloudera的CDH 5

  • HADOOP_LZO版本0.4.15

第三步:设置ENV变量

对于Apache的Hadoop /猪

基本配置是Apache Hadoop的,而猪是建立在其piggying
功能。


  • 组COM pression codeCS在核心的site.xml

     &LT;性&gt;
      &LT;名称&gt; io.com pression codeCS&LT; /名称&gt;
      &LT; VALUE&GT; org.apache.hadoop.io.com press.Gzip codeC,
          org.apache.hadoop.io.com press.Default codeC,
          org.apache.hadoop.io.com press.BZip2 codeC,
          com.hadoop.com pression.lzo.Lzo codeC,
          com.hadoop.com pression.lzo.Lzop codeC
      &LT; /值&GT;
    &LT; /性&gt;
    &LT;性&gt;
      &LT;名称&gt; io.com pression codec.lzo.class&LT; /名称&gt;
      &LT; VALUE&GT; com.hadoop.com pression.lzo.Lzo codeC&LT; /值&GT;
    &LT; /性&gt;


  • 设置马云preduce COM pression在 MA preD-site.xml的的配置:

     &LT;性&gt;
      &LT;名称&gt; MA pred.com press.map.output&LT; /名称&gt;
      &LT; VALUE&GT;真&LT; /值&GT;
    &LT; /性&gt;
    &LT;性&gt;
      &LT;名称&gt; MA pred.map.output.com pression codeC&LT; /名称&gt;
      &LT; VALUE&GT; com.hadoop.com pression.lzo.Lzo codeC&LT; /值&GT;
    &LT; /性&gt;
    &LT;性&gt;
      &LT;名称&gt; MA pred.child.env&LT; /名称&gt;
      &LT; VALUE&GT; JAVA_LIBRARY_PATH = $ JAVA_LIBRARY_PATH:/路径/要/你/ Hadoop的LZO /库/天然的LT; /值&GT;
    &LT; /性&gt;


  • 追加 HADOOP_CLASSPATH hadoop-env.sh

      HADOOP_CLASSPATH = $ HADOOP_CLASSPATH中:/ opt / Cloudera的/包裹/ CDH / lib中/ Hadoop的/ lib目录/ *


有关Cloudera的分销

您可以使用Cloudera的经理,能够同样$ P $通过GUI pvious设置
接口:


  • 有关马preduce分量,改变相应的键作为配置
    上图:

     &GT; ** io.com pression。codeCS **
    &GT; **妈pred.com press.map.output **
    &GT; **妈pred.map.output.com pression。codeC **
    &GT; **妈preduce客户安全阀MA preD-site.xml中**


  • 修改麻preduce客户端环境的片段hadoop-env.sh 以追加
    HADOOP_CLASSPATH 变量。


最后,重新启动在正确的顺序依赖服务和部署
所有节点之间配置。而已!!。然后,您可以测试
使用命令功能,并获得成功的消息类似如下:

  $ Hadoop的罐子/path/to/hadoop-lzo.jar com.hadoop.com pression.lzo.LzoIndexer lzo_logs
   $ 14/​​05/04 1时13分十三秒INFO lzo.GPLNative codeLoader:加载本地GPL库
   $ 14/​​05/04 1时13分十三秒INFO lzo.Lzo codeC:成功装入&安培;初始化本机LZO库[Hadoop的LZO转速49753b4b5a029410c3bd91278c360c2241328387]
   $ 14/​​05/04一时13分14秒INFO lzo.LzoIndexer:[索引] LZO索引文件的数据集/ lzo_logs大小0.00 GB ...
   $ 14/​​05/04一时13分14秒INFO Configuration.de precation:hadoop.native.lib是pcated德$ P $。相反,使用io.native.lib.available
   $ 14/​​05/04一时13分14秒INFO lzo.LzoIndexer:完成LZO索引0.39秒(0.02 MB /秒)。索引大小为0.01 KB。

火花

这耗费了我太多的时间,因为在previous信息少
帖子。但解决的办法是用strightforward previous经验。

没有星火通过焦油或Cloudera的经理无论安装,你需要
仅仅两个路径值附加到 spark-env.sh

  SPARK_LIBRARY_PATH = $ SPARK_LIBRARY_PATH:/路径/要/你/ Hadoop的LZO /库/本地
   SPARK_CLASSPATH = $ SPARK_CLASSPATH:/路径/要/你/ Hadoop的LZO / JAVA /库

Ralated的帖子和问题

的LZO性能比较在<一个给定href=\"http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-com$p$pssion/\">another位置。一个
相关的问题也问及计算器但有没有解决办法
这个长达本教程的完成。您或许也有兴趣在如何
到<一个href=\"http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Com$p$pssion.html\">use在LZO包裹从Cloudera的。

I have been working on this problem for two days and still have not find the way.

Problem: Our Spark installed via newest CDH 5 always complains about the lost of LzoCodec class, even after I install the HADOOP_LZO through Parcels in cloudera manager. We are running MR1 on CDH 5.0.0-1.cdh5.0.0.p0.47.

Try to fix: The configurations in official CDH documentation about 'Using the LZO Parcel' are also added but the problem is still there.

Most of the googled posts give similar advices to the above. I also suspect that the spark is trying to run against YARN that is not activated there; but I can not find the configuration in CMF or other posts about this topic.

Please give me some help if you know how to deal with it.

解决方案

Solved!! May the solution help others who encounter the same problem.


In this tutorial, I will show you how to enable LZO compression on Hadoop, Pig and Spark. I suppose that you have set up a basic hadoop installation successfully (if not, please refer to other tutorials for Hadoop installation ).

You reach this page possibly because you encounter the same problem as I encountered, usually starting with Java exception:

Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found.

As the Apache and Cloudera distributions are two of the most popular distributions, configurations for both contexts are shown. Briefly, three main steps would be walked through towards the final success:

  • Installing native-lzo libraries
  • Installing hadoop-lzo library
  • Setting up environment variables correctly (the right part consuming my most time)

Step1: Installing native-lzo libraries

The native-lzo library is required for the installation of hadoop-lzo. You can install them manually or by facilitating the Package Manager (NOTE: Make sure all nodes in the cluster have native-lzo installed.):

  • On Mac OS:

    sudo port install lzop lzo2
    

  • On RH or CentOS:

    sudo yum install lzo liblzo-devel
    

  • On Debian or ubuntu:

    sudo apt-get install liblzo2-dev
    

Step2: Installing hadoop-lzo library

For Apache Hadoop

As the LZO is GPL'ed, it not shipped with official Hadoop distribution which takes Apache Software License. I recommend the Twitter version which is a forked version of hadoop-gpl-compression with remarkable improvements. If you are running the official Hadoop, some installation structures are provided the the documentation.

For Cloudera Distribution

In Cloudera's CDH, hadoop-lzo is shipped to customers as parcels and you can download and distribute it conviniently using the Cloudera Manager. By default, the hadoop-lzo will be installed in /opt/cloudera/parcels/HADOOP_LZO.

Here we show the configuration on our cluster:

  • Cloudera CDH 5
  • HADOOP_LZO version 0.4.15

Step3: Setting up env variables

For Apache Hadoop/Pig

The basic configuration is for Apache Hadoop, while Pig is piggying upon its functionality.

  • Set compression codecs libraries in core-site.xml:

    <property>
      <name>io.compression.codecs</name>
      <value>org.apache.hadoop.io.compress.GzipCodec,
          org.apache.hadoop.io.compress.DefaultCodec,
          org.apache.hadoop.io.compress.BZip2Codec,
          com.hadoop.compression.lzo.LzoCodec,
          com.hadoop.compression.lzo.LzopCodec
      </value>
    </property>
    <property>
      <name>io.compression.codec.lzo.class</name>
      <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>
    

  • Set MapReduce compression configuration in mapred-site.xml:

    <property>
      <name>mapred.compress.map.output</name>
      <value>true</value>
    </property>
    <property>
      <name>mapred.map.output.compression.codec</name>
      <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>
    <property>
      <name>mapred.child.env</name>
      <value>JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/path/to/your/hadoop-lzo/libs/native</value>
    </property>
    

  • Append HADOOP_CLASSPATH to hadoop-env.sh:

    HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/CDH/lib/hadoop/lib/*
    

For Cloudera Distribution

You can use the Cloudera Manager to enable the same previous settings via GUI interface:

  • For MapReduce component, change the configuration of corresponding keys as above:

    > **io.compression.codecs**
    > **mapred.compress.map.output**
    > **mapred.map.output.compression.codec**
    > **MapReduce Client safety valve for mapred-site.xml**
    

  • Edit MapReduce Client Environment Snippet for hadoop-env.sh to append the HADOOP_CLASSPATH variable.

At last, restart dependent services in right order and deploy the configurations among all nodes. That's it!!. Then you can test the functionality with command and get successful messages similar to below:

   $ hadoop jar /path/to/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer lzo_logs
   $ 14/05/04 01:13:13 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
   $ 14/05/04 01:13:13 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 49753b4b5a029410c3bd91278c360c2241328387]
   $ 14/05/04 01:13:14 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file datasets/lzo_logs size 0.00 GB...
   $ 14/05/04 01:13:14 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
   $ 14/05/04 01:13:14 INFO lzo.LzoIndexer: Completed LZO Indexing in 0.39 seconds (0.02 MB/s).  Index size is 0.01 KB.

For Spark

This consumes me much time because there are less information in previous posts. But the solution is strightforward with previous experience.

No matter the Spark is installed via tar or the Cloudera Manager, you need merely to append two path values to spark-env.sh:

   SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/path/to/your/hadoop-lzo/libs/native
   SPARK_CLASSPATH=$SPARK_CLASSPATH:/path/to/your/hadoop-lzo/java/libs

Ralated posts and questions

A comparison of LZO performance is given in another place. A related question is also asked on StackOverflow but there are no solutions about this up to the finish of this tutorial. You maybe also interested in how to use the LZO Parcel from Cloudera.

这篇关于类com.hadoop.com pression.lzo.Lzo codeC找不到星火上CDH 5?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆