运行spark-submit时发生错误java.lang.ClassNotFoundException [英] The error java.lang.ClassNotFoundException when running spark-submit

查看:84
本文介绍了运行spark-submit时发生错误java.lang.ClassNotFoundException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为我的Scala Spark应用程序运行spark-submit.这些是我执行的步骤:

I want run spark-submit for my Scala Spark application. These are the steps I did:

1)从IntellijIDEA执行Maven清理和打包以获取myTest.jar 2)执行以下spark-submit命令:

1) execute Maven Clean and Package from IntellijIDEA to get myTest.jar 2) execute the following spark-submit command:

spark-submit --name 28 --master local[2] --class org.test.consumer.TestRunner \
/usr/tests/test1/target/myTest.jar \
$arg1 $arg2 $arg3 $arg4 $arg5

这是我要运行的TestRunner对象:

package org.test.consumer

import org.test.consumer.kafka.KafkaConsumer

object TestRunner {

  def main(args: Array[String]) {

    val Array(zkQuorum, group, topic1, topic2, kafkaNumThreads) = args

    val processor = new KafkaConsumer(zkQuorum, group, topic1, topic2)
    processor.run(kafkaNumThreads.toInt)

  }

}

但是spark-submit命令失败,并显示以下消息:

But the spark-submit command fails with the following message:

java.lang.ClassNotFoundException: org.test.consumer.TestRunner
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)

我真的不理解为什么如果正确指定了包就找不到对象TestRunner ...与object而不是class的使用有关吗?

I don't really understand why the object TestRunner cannot be found, if the package is specified correctly... Has it something to do with the usage of object instead of class?

更新:

项目结构(文件夹scala当前标记为 Sources ):

The project structure (the folder scala is currently marked as Sources):

/usr/tests/test1
  .idea
  src
    main
      docker
      resources
      scala
        org
          test
             consumer
                kafka
                    KafkaConsumer.scala
                TestRunner.scala
    test
  target

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.test.abc</groupId>
    <artifactId>consumer</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>1.6.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.11</artifactId>
            <version>1.6.2</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.module</groupId>
            <artifactId>jackson-module-scala_2.11</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.sedis</groupId>
            <artifactId>sedis_2.11</artifactId>
            <version>1.2.2</version>
        </dependency>
        <dependency>
            <groupId>com.lambdaworks</groupId>
            <artifactId>jacks_2.11</artifactId>
            <version>2.3.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>1.6.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib-local_2.11</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>com.github.nscala-time</groupId>
            <artifactId>nscala-time_2.11</artifactId>
            <version>2.12.0</version>
        </dependency>
    </dependencies>

</project>

推荐答案

@FiofanS,问题出在您的目录结构中.

@FiofanS, the problem is in your directory structure.

Maven使用convention over configuratation策略.这意味着,默认情况下,maven希望您将遵循它定义的规则集.例如,它希望您将所有代码放在src/main/java目录中(请参阅

Maven uses a convention over configuratation policy. It means, by default, maven expects that you will follow the set of rules that it has defined. For example, it expects you to put all your code in src/main/java directory (See Maven Standard Directory Structure). But you don't have your code in src/main/java directory. Instead, you have it in src/main/scala directory. By default, maven will not consider src/main/scala as source location.

尽管,maven希望您遵循已定义的规则,但不会强制执行.它还为您提供了根据自己的喜好配置事物的方法.
在您的情况下,您将必须明确指示maven将src/main/scala也视为您的源位置之一.

Although , maven expects you to follow the rules it has defined, but it doesn't enforce them. It also provides you ways in which you can configure things based on your preference.
In your case, you will have to explicitly instruct maven to consider src/main/scala also as one of your source location.

为此,您将必须使用 Maven Build Helper插件.
在pom.xml

To do this, you will have to use the Maven Build Helper Plugin.
Add the below piece of code within the <project>...</project> tag in your pom.xml

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>build-helper-maven-plugin</artifactId>
        <version>1.7</version>
        <executions>
          <execution>
            <id>add-source</id>
            <phase>generate-sources</phase>
            <goals>
              <goal>add-source</goal>
            </goals>
            <configuration>
              <sources>
                <source>src/main/scala</source>
              </sources>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

这应该可以解决您的问题.

This should solve your problem.

这篇关于运行spark-submit时发生错误java.lang.ClassNotFoundException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆