ClassNotFoundException org.apache.mahout.math.VectorWritable [英] ClassNotFoundException org.apache.mahout.math.VectorWritable

查看:119
本文介绍了ClassNotFoundException org.apache.mahout.math.VectorWritable的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将csv文件转换为序列文件,以便我可以在数据上训练和运行分类器。我有一个工作java文件,我编译,然后jar进入mahout工作jar。当我尝试在mahout jar中尝试 hadoop jar 我的作业时,我得到了一个 java.lang.ClassNotFoundException:org.apache.mahout.math。 VectorWritable 。我不确定为什么这是因为如果我查看mahout jar,那个类确实存在。



以下是我正在做的步骤

  #get mahout jar的新副本
rm iris.jar
cp / home / stephen / home / libs / mahout -distribution-0.7 / core / target / mahout-core-0.7-job.jar iris.jar
javac -cp:/home/stephen/home/libs/hadoop-1.0.4/hadoop-core-1.0。 4.jar:/home/stephen/home/libs/mahout-distribution-0.7/core/target/mahout-core-0.7-job.jar -d bin / src / edu / iris / seq / CsvToSequenceFile.java
jar ufv iris.jar -C bin。
hadoop jar iris.jar edu.iris.seq.CsvToSequenceFile iris-data iris-seq

这是我的java文件看起来像

  public class CsvToSequenceFile {

public static void main (String [] args)抛出IOException,
InterruptedException,ClassNotFoundException {

String inputPath = args [0];
String outputPath = args [1];

配置conf = new Configuration();
工作职位=新职位(conf);
job.setJobName(Csv to SequenceFile);
job.setJarByClass(Mapper.class);

job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

job.setNumReduceTasks(0);

job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(VectorWritable.class);

job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);

TextInputFormat.addInputPath(job,new Path(inputPath));
SequenceFileOutputFormat.setOutputPath(job,new Path(outputPath));

//提交并等待完成
job.waitForCompletion(true);
}

}

这里是命令行中的错误

  2/10/30 10:43:32警告mapred.JobClient:使用GenericOptionsParser for解析参数。应用程序应该实现相同的工具。 
12/10/30 10:43:33 INFO input.FileInputFormat:要处理的总输入路径:1
12/10/30 10:43:33 INFO util.NativeCodeLoader:加载native-hadoop library
12/10/30 10:43:33 WARN snappy.LoadSnappy:Snappy本地库未加载
12/10/30 10:43:34信息mapred.JobClient:正在运行的作业:job_201210300947_0005
12/10/30 10:43:35信息mapred.JobClient:map 0%reduce 0%
12/10/30 10:43:50信息mapred.JobClient:Task Id:attempt_201210300947_0005_m_000000_0,Status:FAILED
org.apache.hadoop.conf.Configuration.getClass中的
java.lang.RuntimeException:java.lang.RuntimeException:java.lang.ClassNotFoundException:org.apache.mahout.math.VectorWritable
(Configuration。
at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:929)
at org.apache.hadoop.mapreduce.JobContext.getOutputValueClass(JobContext.java:145)
位于org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOut putFormat.java:61)
at org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector。< init>(MapTask.java:628)
at org.apache.hadoop.mapred.MapTask.runNewMapper (MapTask.java:753)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child $ 4.run(Child。 java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache。 hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
引起:java.lang.RuntimeException: java.lang.ClassNotFoundException:org.apache.mahout.math.VectorWritable $ b $在org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
在org.apache.hadoop.conf .Configuration.getClass(Configuration.java:891)
... 11 more

任何想法如何解决这个问题,或者我甚至尝试要正确地完成这个过程?我对hadoop和mahout很陌生,所以如果我以困难的方式做事,请告诉我。谢谢!

解决方案

这是一个非常常见的问题,几乎肯定是您在hadoop中指定类路径的一个问题命令。

hadoop的工作方式是,在给出hadoop命令后,它会将作业发送到tasktracker执行。所以,重要的是要记住,您的工作是在单独的JVM上执行的,以及它自己的类路径等。您对hadoop命令所做的一部分工作是指定应该使用的类路径等。 / p>

如果您使用maven作为构建系统,我强烈建议使用阴影插件。这将构建一个包含所有必需依赖项的jar,当您向hadoop作业添加依赖项时,您将不必担心classpath问题,因为您正在发送单个jar。



如果你不想走这条路线,可以看看这篇文章,它描述了你的问题和一些潜在的解决方案。特别是,这应该适合你:


将JAR包含在 hadoop jar ... 命令。



I'm trying to turn a csv file into sequence files so that I can train and run a classifier across the data. I have a job java file that I compile and then jar into the mahout job jar. And when I try to hadoop jar my job in the mahout jar, I get a java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable. I'm not sure why this is because if I look in the mahout jar, that class is indeed present.

Here are the steps I'm doing

#get new copy of mahout jar
rm iris.jar
cp /home/stephen/home/libs/mahout-distribution-0.7/core/target/mahout-core-0.7-job.jar iris.jar    
javac -cp :/home/stephen/home/libs/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/stephen/home/libs/mahout-distribution-0.7/core/target/mahout-core-0.7-job.jar -d bin/ src/edu/iris/seq/CsvToSequenceFile.java    
jar ufv iris.jar -C bin .    
hadoop jar iris.jar edu.iris.seq.CsvToSequenceFile iris-data iris-seq

and this is what my java file looks like

public class CsvToSequenceFile {

public static void main(String[] args) throws IOException,
        InterruptedException, ClassNotFoundException {

    String inputPath = args[0];
    String outputPath = args[1];

    Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJobName("Csv to SequenceFile");
    job.setJarByClass(Mapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(VectorWritable.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    TextInputFormat.addInputPath(job, new Path(inputPath));
    SequenceFileOutputFormat.setOutputPath(job, new Path(outputPath));

    // submit and wait for completion
    job.waitForCompletion(true);
}

}

Here is the error in the command line

2/10/30 10:43:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/10/30 10:43:33 INFO input.FileInputFormat: Total input paths to process : 1
12/10/30 10:43:33 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/30 10:43:33 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/30 10:43:34 INFO mapred.JobClient: Running job: job_201210300947_0005
12/10/30 10:43:35 INFO mapred.JobClient:  map 0% reduce 0%
12/10/30 10:43:50 INFO mapred.JobClient: Task Id : attempt_201210300947_0005_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:899)
    at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:929)
    at org.apache.hadoop.mapreduce.JobContext.getOutputValueClass(JobContext.java:145)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:61)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:891)
    ... 11 more

Any ideas how to fix this or am I even trying to do this process correctly? I'm new to hadoop and mahout, so if I'm doing something the hard way, let me know. Thanks!

解决方案

This is a very common problem, and almost certainly an issue with the way you are specifying your classpath in the hadoop command.

The way hadoop works, after you give the "hadoop" command, it ships your job to a tasktracker to execute. So, it's important to keep in mind that your job is executing on a separate JVM, with its own classpath, etc. Part of what you are doing with the "hadoop" command, is specifying the classpath that should be used, etc.

If you are using maven as a build system, I strongly recommend building a "fat jar", using the shade plugin. This will build a jar that contains all your necessary dependencies, and you won't have to worry about classpath issues when you add dependencies to your hadoop job, because you are shipping out a single jar.

If you don't want to go this route, have a look at this article, which describes your problem and some potential solutions. in particular, this should work for you:

Include the JAR in the "-libjars" command line option of the hadoop jar … command.

这篇关于ClassNotFoundException org.apache.mahout.math.VectorWritable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆