Hadoop MapReduce作业已启动,但找不到Map类? [英] Hadoop MapReduce job starts but can not find Map class?

查看:456
本文介绍了Hadoop MapReduce作业已启动,但找不到Map类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的MapReduce应用计算Hive表中字段值的使用情况.在包含/usr/lib/hadood/usr/lib/hive/usr/lib/hcatalog目录中的所有jar之后,我设法从Eclipse构建并运行它.可以.

My MapReduce app counts usage of field values in a Hive table. I managed to build and run it from Eclipse after including all jars from /usr/lib/hadood, /usr/lib/hive and /usr/lib/hcatalog directories. It works.

在经历了许多挫折之后,我还设法将其编译并作为Maven项目运行:

After many frustrations I have also managed to compile and run it as Maven project:

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.bigdata.hadoop</groupId>
<artifactId>FieldCounts</artifactId>
  <packaging>jar</packaging>
  <name>FieldCounts</name>
  <version>0.0.1-SNAPSHOT</version>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
<dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
<dependency>   
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.3.0</version>
</dependency>    
<dependency>
    <groupId>org.apache.hcatalog</groupId>
    <artifactId>hcatalog-core</artifactId>
    <version>0.11.0</version>
</dependency>
</dependencies>
</project>

要从命令行运行作业,必须使用以下脚本:

To run the job from command line the following script has to be used :

#!/bin/sh
export LIBJARS=/usr/lib/hcatalog/share/hcatalog/hcatalog-core.jar,/usr/lib/hive/lib/hive-exec-0.12.0.2.0.6.1-101.jar,/usr/lib/hive/lib/hive-metastore-0.12.0.2.0.6.1-101.jar,/usr/lib/hive/lib/libfb303-0.9.0.jar,/usr/lib/hive/lib/jdo-api-3.0.1.jar,/usr/lib/hive/lib/antlr-runtime-3.4.jar,/usr/lib/hive/lib/datanucleus-api-jdo-3.2.1.jar,/usr/lib/hive/lib/datanucleus-core-3.2.2.jar
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:.:/usr/lib/hcatalog/share/hcatalog/hcatalog-core.jar:/usr/lib/hive/lib/hive-exec-0.12.0.2.0.6.1-101.jar:/usr/lib/hive/lib/hive-metastore-0.12.0.2.0.6.1-101.jar:/usr/lib/hive/lib/libfb303-0.9.0.jar:/usr/lib/hive/lib/jdo-api-3.0.1.jar:/usr/lib/hive/lib/antlr-runtime-3.4.jar:/usr/lib/hive/lib/datanucleus-api-jdo-3.2.1.jar:/usr/lib/hive/lib/datanucleus-core-3.2.2.jar
hadoop jar FieldCounts-0.0.1-SNAPSHOT.jar com.bigdata.hadoop.FieldCounts -libjars ${LIBJARS} simple simpout

现在Hadoop创建并启动下一个失败的作业,因为 Hadoop找不到Map类:

Now Hadoop creates and starts the job that next fails because Hadoop can not find Map class:

14/03/26 16:25:58 INFO mapreduce.Job: Running job: job_1395407010870_0007
14/03/26 16:26:07 INFO mapreduce.Job: Job job_1395407010870_0007 running in uber mode : false
14/03/26 16:26:07 INFO mapreduce.Job:  map 0% reduce 0%
14/03/26 16:26:13 INFO mapreduce.Job: Task Id : attempt_1395407010870_0007_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.bigdata.hadoop.FieldCounts$Map not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class com.bigdata.hadoop.FieldCounts$Map not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 8 more

为什么会这样? Job jar包含所有类,包括Map:

 jar tvf FieldCounts-0.0.1-SNAPSHOT.jar 
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/
   121 Wed Mar 26 15:51:04 MSK 2014 META-INF/MANIFEST.MF
     0 Wed Mar 26 14:29:58 MSK 2014 com/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/
  3992 Fri Mar 21 17:29:22 MSK 2014 hive-site.xml
  4093 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts.class
  2961 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Reduce.class
  1621 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/TableFieldValueKey.class
  4186 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Map.class
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/
  1030 Wed Mar 26 14:28:22 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.xml
   123 Wed Mar 26 14:30:02 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.properties
[hdfs@localhost target]$ jar tvf FieldCounts-0.0.1-SNAPSHOT.jar 
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/
   121 Wed Mar 26 15:51:04 MSK 2014 META-INF/MANIFEST.MF
     0 Wed Mar 26 14:29:58 MSK 2014 com/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/
     0 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/
  3992 Fri Mar 21 17:29:22 MSK 2014 hive-site.xml
  4093 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts.class
  2961 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Reduce.class
  1621 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/TableFieldValueKey.class
  4186 Wed Mar 26 14:29:58 MSK 2014 com/bigdata/hadoop/FieldCounts$Map.class
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/
     0 Wed Mar 26 15:51:06 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/
  1030 Wed Mar 26 14:28:22 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.xml
   123 Wed Mar 26 14:30:02 MSK 2014 META-INF/maven/com.bigdata.hadoop/FieldCounts/pom.properties

怎么了?我应该将Map和Reduce类放在单独的文件中吗?

What is wrong? Should I put Map and Reduce classes in separate files?

MapReduce代码:

MapReduce code:

package com.bigdata.hadoop;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.util.*;
import org.apache.hcatalog.mapreduce.*;
import org.apache.hcatalog.data.*;
import org.apache.hcatalog.data.schema.*;
import org.apache.log4j.Logger;

public class FieldCounts extends Configured implements Tool {

    public static class Map extends Mapper<WritableComparable, HCatRecord, TableFieldValueKey, IntWritable> {

        static Logger logger = Logger.getLogger("com.foo.Bar");

        static boolean firstMapRun = true;
        static List<String> fieldNameList = new LinkedList<String>();
        /**
         * Return a list of field names not containing `id` field name
         * @param schema
         * @return
         */
        static List<String> getFieldNames(HCatSchema schema) {
            // Filter out `id` name just once
            if (firstMapRun) {
                firstMapRun = false;
                List<String> fieldNames = schema.getFieldNames();
                for (String fieldName : fieldNames) {
                    if (!fieldName.equals("id")) {
                        fieldNameList.add(fieldName);
                    }
                }
            } // if (firstMapRun)
            return fieldNameList;
        }

        @Override
      protected void map( WritableComparable key,
                          HCatRecord hcatRecord,
                          //org.apache.hadoop.mapreduce.Mapper
                          //<WritableComparable, HCatRecord, Text, IntWritable>.Context context)
                          Context context)
            throws IOException, InterruptedException {

            HCatSchema schema = HCatBaseInputFormat.getTableSchema(context.getConfiguration());

           //String schemaTypeStr = schema.getSchemaAsTypeString();
           //logger.info("******** schemaTypeStr ********** : "+schemaTypeStr);

           //List<String> fieldNames = schema.getFieldNames();
            List<String> fieldNames = getFieldNames(schema);
            for (String fieldName : fieldNames) {
                Object value = hcatRecord.get(fieldName, schema);
                String fieldValue = null;
                if (null == value) {
                    fieldValue = "<NULL>";
                } else {
                    fieldValue = value.toString();
                }
                //String fieldNameValue = fieldName+"."+fieldValue;
                //context.write(new Text(fieldNameValue), new IntWritable(1));
                TableFieldValueKey fieldKey = new TableFieldValueKey();
                fieldKey.fieldName = fieldName;
                fieldKey.fieldValue = fieldValue;
                context.write(fieldKey, new IntWritable(1));
            }

        }       
    }

    public static class Reduce extends Reducer<TableFieldValueKey, IntWritable,
                                       WritableComparable, HCatRecord> {

        protected void reduce( TableFieldValueKey key,
                               java.lang.Iterable<IntWritable> values,
                               Context context)
                               //org.apache.hadoop.mapreduce.Reducer<Text, IntWritable,
                               //WritableComparable, HCatRecord>.Context context)
            throws IOException, InterruptedException {
            Iterator<IntWritable> iter = values.iterator();
            int sum = 0;
            // Sum up occurrences of the given key 
            while (iter.hasNext()) {
                IntWritable iw = iter.next();
                sum = sum + iw.get();
            }

            HCatRecord record = new DefaultHCatRecord(3);
            record.set(0, key.fieldName);
            record.set(1, key.fieldValue);
            record.set(2, sum);

            context.write(null, record);
        }
    }

    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        args = new GenericOptionsParser(conf, args).getRemainingArgs();

        // To fix Hadoop "META-INFO" (http://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file)
        conf.set("fs.hdfs.impl",
                org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
        conf.set("fs.file.impl",
                org.apache.hadoop.fs.LocalFileSystem.class.getName());

        // Get the input and output table names as arguments
        String inputTableName = args[0];
        String outputTableName = args[1];
        // Assume the default database
        String dbName = null;

        Job job = new Job(conf, "FieldCounts");

        HCatInputFormat.setInput(job,
                InputJobInfo.create(dbName, inputTableName, null));
        job.setJarByClass(FieldCounts.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        // An HCatalog record as input
        job.setInputFormatClass(HCatInputFormat.class);

        // Mapper emits TableFieldValueKey as key and an integer as value
        job.setMapOutputKeyClass(TableFieldValueKey.class);
        job.setMapOutputValueClass(IntWritable.class);

        // Ignore the key for the reducer output; emitting an HCatalog record as
        // value
        job.setOutputKeyClass(WritableComparable.class);
        job.setOutputValueClass(DefaultHCatRecord.class);
        job.setOutputFormatClass(HCatOutputFormat.class);

        HCatOutputFormat.setOutput(job,
                OutputJobInfo.create(dbName, outputTableName, null));
        HCatSchema s = HCatOutputFormat.getTableSchema(job);
        System.err.println("INFO: output schema explicitly set for writing:"
                + s);
        HCatOutputFormat.setSchema(job, s);
        return (job.waitForCompletion(true) ? 0 : 1);
    }

    public static void main(String[] args) throws Exception {
        String classpath = System.getProperty("java.class.path");
        System.out.println("*** CLASSPATH: "+classpath);         
        int exitCode = ToolRunner.run(new FieldCounts(), args);
        System.exit(exitCode);
    }
}

推荐答案

我发现问题出在MapReduce jar所在的目录权限中.此jar建在常规用户而非hdfs用户的主目录中.只要此MRD作业将其工作结果直接输出到Hive表中,它就应该在hdfs用户下运行.如果此类作业在普通用户下运行,则无权将数据写入Hive表!

As I found out the problem was in directory permissions where MapReduce jar was located. This jar was built in a home directory of a regular, not hdfs user. As long as this MRD job outputs results of its work directly into Hive table, it should be run under hdfs user. In case such a job is run under regular user it has no permission to write data into Hive table!

另一方面,CentOS中普通用户的主目录具有700个权限.因此,当您以不同于拥有该主目录的用户的用户身份运行hadoop jar ...命令时,在Hadoop加载类的过程中,将无法在某处拒绝对MRD jar的访问.这就是为什么在hdfs用户下此作业导致java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.bigdata.hadoop.MyMap not found的原因.

On the other hand a home directory of a regular user in CentOS has 700 permissions. So when you run hadoop jar ... command under user different from the user owning this home directory access to MRD jar gets denied somewhere in the process of loading classes by Hadoop. That's why under hdfs user this job results in java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.bigdata.hadoop.MyMap not found.

将主目录的权限从700递归更改为构建MRD jar的755,可以解决此问题.

Recursively changing permission of home directory from 700 to 755 where MRD jar was built solves this problem.

还有一个更重要的问题:如何在普通用户下运行作业,使其有权将数据写入Hive表?

Yet a more important problem remains: How to run a job under regular user so it has permission to write data into a Hive table?

这篇关于Hadoop MapReduce作业已启动,但找不到Map类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆