为什么hadoop不能识别我的Map类? [英] Why hadoop does not recognize my Map class?

查看:314
本文介绍了为什么hadoop不能识别我的Map类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在hadoop 2.2.0上运行我的PDFWordCount map-reduce程序,但我得到这个错误:

  13 / 12/25 23:37:26信息mapreduce.Job:任务ID:attempt_1388041362368_0003_m_000009_2,状态:FAILED 
错误:java.lang.RuntimeException:java.lang.ClassNotFoundException:类PDFWordCount $未找到MyMap
在org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org .apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop .mapred.YarnChild $ 2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415 )
在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
在org.apache.hadoop.mapred .YarnChild.main(YarnChild.java:157)
导致:java.lang.ClassNotFoundException:类PDFWordCount未找到$ MyMap
在org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java :1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 8 more

它说我的地图类是未知的。我有一个在3个虚拟机上有namenod和2个datanode的集群。



我的主要功能是这样的:

<$ p公共静态无效主要(字符串[] args)抛出异常{
配置conf =新配置();
@SuppressWarnings(deprecation)
Job job = new Job(conf,wordcount);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(MyMap.class);
job.setReducerClass(MyReduce.class);

job.setInputFormatClass(PDFInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job,new Path(args [0]));
FileOutputFormat.setOutputPath(job,new Path(args [1]));

job.setJarByClass(PDFWordCount.class);
job.waitForCompletion(true);
}

如果我使用以下命令运行我的jar:

  yarn jar myjar.jar PDFWordCount / in / out 

它需要 / in 作为输出路径,并且当我有 job.setJarByClass(PDFWordCount.class)时,

我已经运行了一个简单的WordCount项目,其主函数完全像这个主函数并且运行它,我使用了 yarn jar wc.jar MyWordCount / in2 / out2 并且运行完美。

我不明白什么是问题!



更新:我尝试将我的工作从此项目移至我成功使用的wordcount项目。我构建了一个包,将相关文件从pdfwordcount项目复制到这个包并导出项目(我的主文件没有改为使用 PDFInputFormat ,所以除了将java文件移动到新的包)。它没有工作。我从其他项目中删除了文件,但没有奏效。我将java文件移回到默认包但它不起作用!

有什么不对?!

解决方案

我发现了一种解决这个问题的方法,即使我无法理解实际存在的问题。



当我想在eclipse中将我的java项目导出为jar文件时,我有两个选择:


  1. 将所需的库提取到生成的JAR中

  2. 将所需的库打包到生成的JAR中

我不知道究竟有什么区别,或者是不是大问题。我曾经选择第二个选项,但是如果我选择第一个选项,我可以使用以下命令运行我的作业:

  yarn jar pdf .jar / in / out 


I am trying to run my PDFWordCount map-reduce program on hadoop 2.2.0 but I get this error:

13/12/25 23:37:26 INFO mapreduce.Job: Task Id : attempt_1388041362368_0003_m_000009_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class PDFWordCount$MyMap not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
    ... 8 more

It says that my map class is not known. I have a cluster with a namenod and 2 datanodes on 3 VMs.

My main function is this:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    @SuppressWarnings("deprecation")
    Job job = new Job(conf, "wordcount");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(MyMap.class);
    job.setReducerClass(MyReduce.class);

    job.setInputFormatClass(PDFInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setJarByClass(PDFWordCount.class);
    job.waitForCompletion(true);
  }

If I run my jar using this command:

yarn jar myjar.jar PDFWordCount /in /out

it takes /in as output path and gives me error while I have job.setJarByClass(PDFWordCount.class); in my main function as you see above.

I have run simple WordCount project with main function exactly like this main function and to run it, I used yarn jar wc.jar MyWordCount /in2 /out2 and it run flawlessly.

I can't understand what is the problem!

UPDATE: I tried to move my work from this project to wordcount project I have used successfully. I built a package, copied related files from pdfwordcount project to this package and exported the project (my main was not changed to used PDFInputFormat, so I did nothing except moving java files to new package.) It didn't work. I deleted files from other project but it didn't work. I moved java file back to default package but it didn't work!

What's wrong?!

解决方案

I found a way to overcome this problem, even though I couldn't understand what was the problem actually.

When I want to export my java project as a jar file in eclipse, I have two options:

  1. Extract required libraries into generated JAR
  2. Package required libraries into generated JAR

I don't know exactly what is the difference or is it a big deal or not. I used to choose second option, but if I choose first option, I can run my job using this command:

yarn jar pdf.jar /in /out

这篇关于为什么hadoop不能识别我的Map类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆