java.lang.ClassNotFoundException:com.datastax.spark.connector.rdd.partitioner.CassandraPartition [英] java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition

查看:94
本文介绍了java.lang.ClassNotFoundException:com.datastax.spark.connector.rdd.partitioner.CassandraPartition的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我与Cassandra一起工作了一段时间,现在我正在尝试设置spark和spark-cassandra-connector。我正在Windows 10中使用IntelliJ IDEA(也是第一次使用IntelliJ IDEA和Scala)做到这一点。

I've been working with Cassandra for a little while and now I'm trying to setup spark and spark-cassandra-connector. I'm using IntelliJ IDEA to do that (first time with IntelliJ IDEA and Scala too) in Windows 10.

build.gradle

build.gradle

apply plugin: 'scala'
apply plugin: 'idea'
apply plugin: 'eclipse'

repositories {
    mavenCentral()

    flatDir {
        dirs 'runtime libs'
    }
}

idea {
    project {
        jdkName = '1.8'
        languageLevel = '1.8'
    }
}

dependencies {
    compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.5'
    compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.5'
    compile group: 'org.scala-lang', name: 'scala-library', version: '2.11.12'
    compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.5.0'
    compile group: 'log4j', name: 'log4j', version: '1.2.17'
}

configurations.all {
    resolutionStrategy {
        force 'com.google.guava:guava:12.0.1'
    }
}

compileScala.targetCompatibility = "1.8"
compileScala.sourceCompatibility = "1.8"

jar {
    zip64 true
    archiveName = "ModuleName.jar"
    from {
        configurations.compile.collect {
            it.isDirectory() ? it : zipTree(it)
        }
    }
    manifest {
        attributes 'Main-Class': 'org.module.ModuelName'
    }
    exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'

}

ModuleName.scala

package org.module
import org.apache.spark.sql.SparkSession
import com.datastax.spark.connector._
import org.apache.spark.sql.types.TimestampType

object SentinelSparkModule {

  case class Document(id: Int, time: TimestampType, data: String)

  def main(args: Array[String]) {
    val spark = SparkSession.builder
      .master("spark://192.168.0.3:7077")
      .appName("App")
      .config("spark.cassandra.connection.host", "127.0.0.1")
      .config("spark.cassandra.connection.port", "9042")
      .getOrCreate()

    //I'm trying it without [Document] since it throws 'Failed to map constructor parameter id in
    //org.module.ModuleName.Document to a column of keyspace.table'

    val documentRDD = spark.sparkContext
      .cassandraTable/*[Document]*/("keyspace", "table")
      .select()
    documentRDD.take(10).foreach(println)
    spark.stop()
 }
}

我在 spark://192.168.0.3:7077上有正在运行的Spark Master 和那个主机的工作人员,但我没有尝试将提交作为控制台中的已编译jar的工作,我只是想得到它

I have a running spark master at spark://192.168.0.3:7077 and a worker of that master, but I haven't tried to submit the job as a compiled jar in the console, I'm just trying to get it to work in the IDE.

谢谢

推荐答案

Cassandra连接器jar需要添加到工人的类路径。一种方法是用所有必需的依赖项构建一个uber jar并提交给集群。

Cassandra connector jar needs to be added to the classpath of workers. One way to do this is to build an uber jar with all required dependencies and submit to the cluster.

请参阅:使用Gradle构建uberjar

此外,请确保更改作用域生成文件的依赖性从编译提供的

Also, make sure you change the scope of dependencies in you build file from compile to provided for all jars except the cassandra connector.

参考: https://reflectoring.io/maven-scopes-gradle-configurations/

这篇关于java.lang.ClassNotFoundException:com.datastax.spark.connector.rdd.partitioner.CassandraPartition的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆