运行spark-submit时发生错误java.lang.ClassNotFoundException [英] The error java.lang.ClassNotFoundException when running spark-submit
问题描述
我想为我的Scala Spark应用程序运行spark-submit.这些是我执行的步骤:
I want run spark-submit for my Scala Spark application. These are the steps I did:
1)从IntellijIDEA执行Maven清理和打包以获取myTest.jar 2)执行以下spark-submit命令:
1) execute Maven Clean and Package from IntellijIDEA to get myTest.jar 2) execute the following spark-submit command:
spark-submit --name 28 --master local[2] --class org.test.consumer.TestRunner \
/usr/tests/test1/target/myTest.jar \
$arg1 $arg2 $arg3 $arg4 $arg5
这是我要运行的TestRunner
对象:
package org.test.consumer
import org.test.consumer.kafka.KafkaConsumer
object TestRunner {
def main(args: Array[String]) {
val Array(zkQuorum, group, topic1, topic2, kafkaNumThreads) = args
val processor = new KafkaConsumer(zkQuorum, group, topic1, topic2)
processor.run(kafkaNumThreads.toInt)
}
}
但是spark-submit
命令失败,并显示以下消息:
But the spark-submit
command fails with the following message:
java.lang.ClassNotFoundException: org.test.consumer.TestRunner
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
我真的不理解为什么如果正确指定了包就找不到对象TestRunner
...与object
而不是class
的使用有关吗?
I don't really understand why the object TestRunner
cannot be found, if the package is specified correctly... Has it something to do with the usage of object
instead of class
?
更新:
项目结构(文件夹scala
当前标记为 Sources ):
The project structure (the folder scala
is currently marked as Sources):
/usr/tests/test1
.idea
src
main
docker
resources
scala
org
test
consumer
kafka
KafkaConsumer.scala
TestRunner.scala
test
target
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.test.abc</groupId>
<artifactId>consumer</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.11</artifactId>
<version>2.7.5</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.7.5</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.7.5</version>
</dependency>
<dependency>
<groupId>org.sedis</groupId>
<artifactId>sedis_2.11</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>com.lambdaworks</groupId>
<artifactId>jacks_2.11</artifactId>
<version>2.3.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib-local_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>com.github.nscala-time</groupId>
<artifactId>nscala-time_2.11</artifactId>
<version>2.12.0</version>
</dependency>
</dependencies>
</project>
推荐答案
@FiofanS,问题出在您的目录结构中.
@FiofanS, the problem is in your directory structure.
Maven使用convention over configuratation
策略.这意味着,默认情况下,maven希望您将遵循它定义的规则集.例如,它希望您将所有代码放在src/main/java
目录中(请参阅
Maven uses a convention over configuratation
policy. It means, by default, maven expects that you will follow the set of rules that it has defined. For example, it expects you to put all your code in src/main/java
directory (See Maven Standard Directory Structure). But you don't have your code in src/main/java
directory. Instead, you have it in src/main/scala
directory. By default, maven will not consider src/main/scala
as source location.
尽管,maven希望您遵循已定义的规则,但不会强制执行.它还为您提供了根据自己的喜好配置事物的方法.
在您的情况下,您将必须明确指示maven将src/main/scala
也视为您的源位置之一.
Although , maven expects you to follow the rules it has defined, but it doesn't enforce them. It also provides you ways in which you can configure things based on your preference.
In your case, you will have to explicitly instruct maven to consider src/main/scala
also as one of your source location.
为此,您将必须使用 Maven Build Helper插件.
在pom.xml
To do this, you will have to use the Maven Build Helper Plugin.
Add the below piece of code within the <project>...</project>
tag in your pom.xml
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>1.7</version>
<executions>
<execution>
<id>add-source</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>src/main/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
这应该可以解决您的问题.
This should solve your problem.
这篇关于运行spark-submit时发生错误java.lang.ClassNotFoundException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!