ClassNotFoundException在修改后的SimpleShortestPathsVertex上运行GiraphRunner [英] ClassNotFoundException running GiraphRunner on a modified SimpleShortestPathsVertex

查看:142
本文介绍了ClassNotFoundException在修改后的SimpleShortestPathsVertex上运行GiraphRunner的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Giraph比较陌生,我试图让我的Giraph edit-compile-deploy循环为我们的代码工作。我能够运行由 http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ ,但我坚持运行SimpleShortestPathsVertex Giraph示例的修改版本时发生ClassNotFoundException。我已经尝试了各种各样的-libjars和HADOOP_CLASSPATH的组合,但我没有想法,我非常感谢你的帮助。详情如下。

版本




  • Hadoop:Hadoop 2.0.0-cdh4。 4.0

  • Giraph:giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar -with-dependencies.jar

>

PageRankBenchmark运行正常



  $ hadoop jar $ GIRAPH_HOME / giraph-examples / target / giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar -with-dependencies.jar \ 
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph .zkList =< myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1

...
14/08/01 11 :42:44信息mapred.JobClient:作业完成:job_201407291058_0015
...
(全部输出在下面)



GiraphRunner SimpleShortestPathsVertex也运行OK



  $ hadoop jar $ GIRAPH_HOME / giraph-examples / target /giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \ 
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList = < myho st>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput / tiny_graph。 txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput / shortestpathsC2 \
-ca SimpleShortestPathsVertex.source = 2 \
-w 1

...
14/08/01 11:47:46信息mapred.JobClient:工作完成:job_201407291058_0017
...
(全部输出在下面)

奖励:结果是正确的:

  $ hadoop fs -cat goutput / shortestpathsC2 / p * 
0 1.0
2 2.0
1 0.0
3 1.0
4 5.0



但是我的SimpleShortestPathsVertex的修改版本得到ClassNotFoundException



包含已修改顶点(KdlSimpleShortestPathsVertex,无包)的jar是OK:

  $ jar -tf 〜/ kdl_hadoop_play .jar 
META-INF / MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF /

但是我运行的pukes:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ GIRAPH_HOME / giraph-core / target / giraph- 1.0.0-for-hadoop-2.0.0-alpha-jar -with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList =< myhost> ;: 2181 \
-libjars〜/ kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip / user /cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op / user / cornell / goutput / shortestpathsC2 \
- ca KdlSimpleShortestPathsVertex.source = 2 \
-w 1

在线程main中的异常java.lang.ClassNotFoundException:KdlSimpleShortestPathsVertex在java.net.URLClassLoader中
$ 1.run( URLClassLoader.java:366)java.net.URLClassLoader
$ 1.run(URLClassLoa der.java:355)$ java.util.AccessController.doPrivileged(Native方法)
在java.net.URLClassLoader.findClass上的
(URLClassLoader.java:354)$ b $在java.lang。 ClassLoader.loadClass(ClassLoader.java:425)$ b $在java.lang.ClassLoader.loadClass(ClassLoader.java:358)
在java.lang.Class.forName0(本地方法)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils。 ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
在org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
在org.apache.hadoop.util.ToolRunner.run(ToolRunner。
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.refl ect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main( RunJar.java:208)



我最好的猜想是...



...环顾四周后,也许GiraphRunner不能正确处理-libjars,正如 http://grepalex.com/2013/02/25/hadoop-libjars/ (确保你的代码使用GenericOptionsParser)。浏览Giraph源文件,我没有看到该类访问。我尝试将HADOOP_CLASSPATH设置到我的jar中,但是这并没有解决问题。



任何帮助都很棒!



PageRankBenchmark输出



  14/08/01 11:42:27信息job.GiraphJob:run:由于checkpointing已禁用(默认),不允许任何任务重试(设置mapred.map.max.attempts = 0,旧值= 4)
14/08/01 11:42:28警告mapred.JobClient:使用GenericOptionsParser进行解析参数。应用程序应该实现相同的工具。
14/08/01 11:42:28 WARN bsp.BspOutputFormat:checkOutputSpecs:ImmutableOutputCommiter不会检查任何内容
14/08/01 11:42:29信息mapred.JobClient:正在运行的作业:job_201407291058_0015
14/08/01 11:42:30信息mapred.JobClient:map 0%reduce 0%
14/08/01 11:42:40信息mapred.JobClient:map 50%reduce 0%
14/08/01 11:42:41信息mapred.JobClient:地图100%减少0%
14/08/01 11:42:44信息mapred.JobClient:工作完成:job_201407291058_0015
14/08/01 11:42:44信息mapred.JobClient:计数器:39
14/08/01 11:42:44信息mapred.JobClient:文件系统计数器
14/08/01 11:42:44信息mapred.JobClient:FILE:读取的字节数= 0
14/08/01 11:42:44信息mapred.JobClient:FILE:写入的字节数= 369846
14 / 08/01 11:42:44 INFO mapred.JobClient:FILE:读操作数= 0
14/08/01 11:42:44 INFO mapred.JobClient:FILE:大读操作数= 0
14/08/01 11:42:44信息mapred.JobClient:FILE:Num写操作= 0
14/08/01 11:42:44信息mapred.JobClient:HDFS:读取的字节数= 88
14/08/01 11:42:44 INFO mapred。 JobClient:HDFS:写入的字节数= 0
14/08/01 11:42:44信息mapred.JobClient:HDFS:读取操作次数= 2
14/08/01 11:42: 44 INFO mapred.JobClient:HDFS:写操作次数= 1
14/08 / 01 11:42:44信息mapred.JobClient:作业计数器
14/08/01 11:42:44信息mapred.JobClient:启动地图任务= 2
14/08/01 11:42: 44信息mapred.JobClient:所有地图在占用时隙中花费的总时间(ms)= 15772
14/08/01 11:42:44信息mapred.JobClient:占用时隙中所有减少花费的总时间(ms) )= 0
14/08/01 11:42:44信息mapred.JobClient:预留插槽后等待的所有地图花费的总时间(毫秒)= 0
14/08/01 11:42: 44信息mapred.JobClient:所有人花费的时间减少(ms)= 0
14/08/01 11:42:44信息mapred.JobClient:Map-Reduce Framework
14/08/01 11:42:44信息mapred.JobClient:地图输入记录= 2
14/08/01 11:42:44信息mapred.JobClient:地图输出记录= 0
14/08/01 11:42:44信息mapred.JobClient:输入拆分字节= 88
14/08/01 11:42:44信息mapred.JobClient:溢出记录= 0
14/08/01 11:42:44信息mapred.JobClient:花费的CPU时间(毫秒)= 2230
14/08/01 11:42:44信息mapred.JobClient:物理内存(字节)snapshot = 411357184
14/08/01 11:42:44信息mapred.JobClient:Virtual内存(字节)快照= 2428895232
14/08/01 11:42:44信息mapred.JobClient:总承诺堆使用率(字节)= 806027264
14/08/01 11:42:44信息mapred.JobClient:Giraph Stats
14/08/01 11:42:44信息mapred.JobClient:聚合边缘= 50
14/08/01 11:42:44信息mapred.JobClient:聚合完成顶点= 50
14/08/01 11:42:44信息mapred.JobClient:Aggr egate vertices = 50
14/08/01 11:42:44信息mapred.JobClient:当前主任务分区= 0
14/08/01 11:42:44信息mapred.JobClient:当前工人= 1
14/08/01 11:42:44信息mapred.JobClient:最后一次checkpointed superstep = 0
14/08/01 11:42:44信息mapred.JobClient:发送消息= 0
14/08/01 11:42:44信息mapred.JobClient:Superstep = 4
14/08/01 11:42:44信息mapred.JobClient:Giraph计时器
14/08 / 01 11:42:44信息mapred.JobClient:输入superstep(毫秒)= 238
14/08/01 11:42:44信息mapred.JobClient:设置(毫秒)= 2903
14/08 / 01 11:42:44信息mapred.JobClient:Shutdown(毫秒)= 68
14/08/01 11:42:44信息mapred.JobClient:Superstep 0(毫秒)= 77
14 / 08/01 11:42:44信息mapred.JobClient:Superstep 1(毫秒)= 64
14/08/01 11:42:44信息mapred.JobClient:Superstep 2(毫秒)= 45
14/08/01 11:42:44信息mapred.JobClient:Superstep 3(毫秒)= 43
14/08/01 11:42:44信息mapred.JobClient:Total(毫秒)= 3442



SimpleShortestPathsVertex输出



  14/08/01 11:47:37 INFO utils.ConfigurationUtils:未指定边缘输入格式。确保你的InputFormat不需要一个。 
14/08/01 11:47:37 INFO utils.ConfigurationUtils:在GiraphConfiguration中将自定义参数[SimpleShortestPathsVertex.source]设置为[2]
14/08/01 11:47:37 WARN作业。 GiraphConfigurationValidator:输出格式顶点索引类型未知
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator:输出格式顶点值类型未知
14/08/01 11:47: 37 WARN job.GiraphConfigurationValidator:输出格式边缘值类型未知
14/08/01 11:47:37 INFO job.GiraphJob:run:由于checkpointing已禁用(默认),因此不允许任何任务重试(设置mapred.map.max.attempts = 0,旧值= 4)
14/08/01 11:47:37警告mapred.JobClient:使用GenericOptionsParser解析参数。应用程序应该实现相同的工具。
14/08/01 11:47:38信息mapred.JobClient:正在运行的作业:job_201407291058_0017
14/08/01 11:47:39信息mapred.JobClient:map 0%reduce 0%
14/08/01 11:47:44信息mapred.JobClient:地图50%减少0%
14/08/01 11:47:45信息mapred.JobClient:地图100%减少0%
14/08/01 11:47:46信息mapred.JobClient:工作完成:job_201407291058_0017
14/08/01 11:47:46信息mapred.JobClient:计数器:39
14/08 / 01 11:47:46信息mapred.JobClient:文件系统计数器
14/08/01 11:47:46信息mapred.JobClient:FILE:读取的字节数= 0
14/08 / 01 11:47:46信息mapred.JobClient:FILE:写入的字节数= 367068
14/08/01 11:47:46信息mapred.JobClient:FILE:读取操作数= 0
14/08/01 11:47:46信息mapred.JobClient:FILE:大量读取操作的数量= 0
14/08/01 11:47:46信息mapred.JobClient:FILE:写入操作的数量= 0
14/08/01 11:47:46信息mapred.JobClient:HDFS:读取的字节数= 200
14 / 08/01 11:47:46信息mapred.JobClient:HDFS:写入的字节数= 30
14/08/01 11:47:46信息mapred.JobClient:HDFS:读取操作次数= 5
14/08/01 11:47:46信息mapred.JobClient:HDFS:大量读取操作的数量= 0
14/08/01 11:47:46信息mapred.JobClient:HDFS:写入次数操作= 2
14/08/01 11:47:46信息mapred.JobClient:作业计数器
14/08/01 11:47:46信息mapred.JobClient:启动的地图任务= 2
14/08/01 11:47:46信息mapred.JobClient:所有地图在占用插槽中花费的总时间(毫秒)= 8538
14/08/01 11:47:46信息mapred.JobClient:占用插槽中所有缩减花费的总时间(毫秒)= 0
14/08/01 11:47:46信息mapred.JobClient:预留插槽后等待的所有地图花费的总时间(毫秒)= 0
14/08/01 11:47:46信息mapred.JobClient:所有花费的时间都减少了预留槽后的等待时间(毫秒)= 0
14/08/01 11:47:46信息mapred.JobClient :Map-Reduce Framework
14/08/01 1 1:47:46信息mapred.JobClient:地图输入记录= 2
14/08/01 11:47:46信息mapred.JobClient:地图输出记录= 0
14/08/01 11: 47:46信息mapred.JobClient:输入分割字节= 88
14/08/01 11:47:46信息mapred.JobClient:溢出记录= 0
14/08/01 11:47:46 INFO mapred.JobClient:CPU花费的时间(ms)= 1590
14/08/01 11:47:46信息mapred.JobClient:物理内存(字节)快照= 341344256
14/08/01 11 :47:46信息mapred.JobClient:虚拟内存(字节)快照= 2363527168
14/08/01 11:47:46信息mapred.JobClient:总提交堆使用率(字节)= 504758272
14 / 08/01 11:47:46信息mapred.JobClient:Giraph Stats
14/08/01 11:47:46信息mapred.JobClient:聚合边缘= 12
14/08/01 11: 47:46信息mapred.JobClient:聚合完成的顶点= 5
14/08/01 11:47:46信息mapred.JobClient:聚合顶点= 5
14/08/01 11:47:46信息mapred.JobClient:当前主任务分区= 0
14/08/01 11 :47:46信息mapred.JobClient:当前工作人员= 1
14/08/01 11:47:46信息mapred.JobClient:最后一次checkpointed superstep = 0
14/08/01 11:47: 46信息mapred.JobClient:发送消息= 0
14/08/01 11:47:46信息mapred.JobClient:Superstep = 4
14/08/01 11:47:46信息mapred.JobClient :Giraph Timers
14/08/01 11:47:46信息mapred.JobClient:输入superstep(毫秒)= 181
14/08/01 11:47:46信息mapred.JobClient:Setup(毫秒)= 313
14/08/01 11:47:46信息mapred.JobClient:关闭(毫秒)= 128
14/08/01 11:47:46信息mapred.JobClient:Superstep 0 (毫秒)= 57
14/08/01 11:47:46信息mapred.JobClient:Superstep 1(毫秒)= 54
14/08/01 11:47:46信息mapred.JobClient:信息mapred.JobClient:Superstep 3(毫秒)= 35
14/08/01 11:47:46信息mapred。 JobClient:Total(毫秒)= 805


解决方案

好吧,在查看hadoop脚本以及Hadoop和Giraph源代码之后,我想我已经明白了。最重要的提示来自在Hadoop中使用libjars选项以及此行输出:


WARN mapred.JobClient:使用GenericOptionsParser解析
参数。应用程序应该实现相同的工具。

原因似乎是GiraphRunner使用它自己的ConfigurationUtils.parseArgs()来获取组织.apache.commons.cli.CommandLine,而不是使用推荐的org.apache.hadoop.util.GenericOptionsParser.getCommandLine(),它将授予'libjars'选项。这让我回到了Hadoop的通用类路径处理工具:CLASSPATH和/或HADOOP_CLASSPATH。这是什么工作:




  • 设置HADOOP_CLASSPATH以包含您的应用程序jar gigraph核心jar,使用 分隔符。


传递使用相同类路径但带有逗号分隔符的< >

例如,在我的机器上:

  $ export GIRAPH_HOME = / share / apps / giraph 
$ export HADOOP_CLASSPATH = / home /< me> /kdl_hadoop_play.jar:$ GIRAPH_HOME / giraph-ex.jar:$ HADOOP_CLASSPATH
$ export LIBJARS = / home /< me> / kdl_hadoop_play.jar,$ GIRAPH_HOME / giraph-core.jar
$ hadoop fs -rm -R goutput / shortestpathsC2
$ hadoop jar $ GIRAPH_HOME / giraph-ex.jar org.apache.giraph.GiraphRunner \
-Dgiraph.zkList =< myhost>:2181 \
-libjars $ {LIBJARS} \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io。 formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op / user / cornell / goutput / shortestpathsC2 \
-ca SimpleShortestPathsVertex.source = 2 \
-w 1
...
$ hadoop fs -cat goutput / shortestpathsC2 / p *

这给出了预期的输出和结果。

更普遍的是,如果Giraph团队改变代码以使用(显然)更多标准解析器。



希望有帮助!


I'm relatively new to Giraph and I'm trying to get my Giraph edit-compile-deploy loop working for our code. I am able to run various examples inspired by http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ , but I'm stuck with a ClassNotFoundException when running my modified version of the SimpleShortestPathsVertex Giraph example. I've tried various combinations of -libjars and HADOOP_CLASSPATH, but I'm out of ideas and I'd really appreciate your help. Details follow.

Versions

  • Hadoop: Hadoop 2.0.0-cdh4.4.0
  • Giraph: giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar

The PageRankBenchmark runs OK

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1

...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)

The GiraphRunner SimpleShortestPathsVertex also runs OK

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1

...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)

Bonus: the results are correct:

$ hadoop fs -cat goutput/shortestpathsC2/p*
0   1.0
2   2.0
1   0.0
3   1.0
4   5.0

But my modified version of SimpleShortestPathsVertex gets ClassNotFoundException

The jar containing the modified vertex (KdlSimpleShortestPathsVertex, no package) is OK:

$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/

But my run pukes:

$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1

Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

My best guess ...

...after looking around is that maybe GiraphRunner is not processing the -libjars correctly, as hinted at by http://grepalex.com/2013/02/25/hadoop-libjars/ ("Make sure your code is using GenericOptionsParser"). Browsing the Giraph source, I do not see that class accessed. I tried setting HADOOP_CLASSPATH to my jar, but that didn't solve the problem.

Any help would be awesome!

PageRankBenchmark output

14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient:   File System Counters
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient:   Job Counters 
14/08/01 11:42:44 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient:     Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient:     Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient:     CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient:     Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient:     Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient:     Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient:     Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient:     Total (milliseconds)=3442

SimpleShortestPathsVertex output

14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient:   File System Counters
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient:   Job Counters 
14/08/01 11:47:46 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient:     Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient:     Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient:     CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient:     Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient:     Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient:     Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient:     Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient:     Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient:     Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient:     Total (milliseconds)=805

解决方案

OK, after looking at the hadoop scripts along with Hadoop and Giraph source, I think I figured it out. The big hint came from Using the libjars option with Hadoop along with this line from the output:

WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

The cause appears to be that GiraphRunner uses its own ConfigurationUtils.parseArgs() to get the org.apache.commons.cli.CommandLine instead of using the recommended org.apache.hadoop.util.GenericOptionsParser.getCommandLine(), which honors the 'libjars' option. This led me to fall back on Hadoop's generic classpath-handling tools: CLASSPATH and/or HADOOP_CLASSPATH. Here's what worked:

  • Set HADOOP_CLASSPATH to include your application jar and the gigraph core jar, using a colon delimiter.
  • Pass -libjars using that same classpath but with a comma delimiter.

For example, on my machine:

$ export GIRAPH_HOME=/share/apps/giraph
$ export HADOOP_CLASSPATH=/home/<me>/kdl_hadoop_play.jar:$GIRAPH_HOME/giraph-ex.jar:$HADOOP_CLASSPATH
$ export LIBJARS=/home/<me>/kdl_hadoop_play.jar,$GIRAPH_HOME/giraph-core.jar
$ hadoop fs -rm -R goutput/shortestpathsC2
$ hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ${LIBJARS} \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1
...
$ hadoop fs -cat goutput/shortestpathsC2/p*

Which gives the expected output and results.

More generally, it would be helpful if the Giraph team changed the code to use the (apparently) more standard parser.

Hope that helps!

这篇关于ClassNotFoundException在修改后的SimpleShortestPathsVertex上运行GiraphRunner的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆