火花作业卡桑德拉错误 [英] spark job cassandra error
问题描述
每次使用cassandra连接器在spark中运行scala程序时,都会收到此错误
I am getting this error everytime I am running my scala program in spark with cassandra connector
Exception during preparation of SELECT count(*) FROM "eventtest"."simpletbl" WHERE token("a") > ? AND token("a") <= ?
ALLOW FILTERING: class org.joda.time.DateTime in JavaMirror with org.apache.spark.util.MutableURLClassLoader@23041911 of type class org.apache.spark.util.MutableURLClassLoader
with classpath
[file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./spark-cassandra-connector_2.10-1.4.0-M1.jar
,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./cassandra-driver-core-2.1.5.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./cassandra-spark-job_2.10-1.0.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./guava-18.0.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./joda-convert-1.2.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./cassandra-clientutil-2.1.5.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./google-collections-1.0.jar] and parent being sun.misc.Launcher$AppClassLoader@6132b73b of type class sun.misc.Launcher$AppClassLoader with classpath [file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/conf/,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,file:
/home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar] and parent being sun.misc.Launcher$ExtClassLoader@489bb457 of type class sun.misc.Launcher$ExtClassLoader with classpath [file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/dnsns.jar,file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/sunpkcs11.jar,file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/sunjce_provider.jar,file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/zipfs.jar,file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/libatk-wrapper.so,file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/java-atk-wrapper.jar,file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/localedata.jar,file:
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/icedtea-sound.jar] and parent being primordial classloader with boot classpath [/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rhino.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/classes] not found.
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:163)
这是我的程序
/** CassandraJob.scala **/
import com.datastax.spark.connector._
import org.apache.spark._
object CassandraJob {
def main(args: Array[String]) {
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "172.28.0.164")
.set("soark.cassandra.connection.rpc.port", "9160")
val sc = new SparkContext(conf)
val rdd = sc.cassandraTable("eventtest", "simpletbl");
println("cassandra row count : " + rdd.count + ", cassanra row : " + rdd.first)
}
}
我已经使用sbt编译,sbt包构建了文件
I have build the file using sbt compile, sbt package
这是我提交的方式火花作业
Here is how I am submitting spark job
./bin/spark-submit --jars $(echo /home/sysadmin/ApacheSpark/jar/*.jar | tr ' ' ',') --class "CassandraJob" --master spark://noi-cs-01:7077 /home/sysadmin/ApacheSparkProj/CassandraJob/target/scala-2.10/cassandra-spark-job_2.10-1.0.jar
推荐答案
我猜您正在使用 org.joda.time.DateTime
在您提交的jar中丢失。只需将此jar添加到您的依赖项中: ... --jars $(echo /home/sysadmin/ApacheSpark/jar/*.jar | tr''','),/ PATH / TO / DOWNLOADED / JODATIME / JAR-类 CassandraJob ...
I guess that you are using org.joda.time.DateTime
which is missing in your submitted jar. Just add this jar to your dependencies: ... --jars $(echo /home/sysadmin/ApacheSpark/jar/*.jar | tr ' ' ','),/PATH/TO/DOWNLOADED/JODATIME/JAR --class "CassandraJob..."
另一种方法是包含 org.joda sbt和程序集 fat jar 中库依赖项中的.time.DateTime
与此库一起使用 sbt程序集
插件而不是 sbt软件包
。
The other way is to include org.joda.time.DateTime
in library dependencies in sbt and assembly fat jar with this library using sbt assembly
plugin instead of sbt package
.
这篇关于火花作业卡桑德拉错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!