将DataFrame保存为CVS时出现Spark 2.0 DataSourceRegister配置错误 [英] Spark 2.0 DataSourceRegister configuration error while saving DataFrame as cvs

查看:99
本文介绍了将DataFrame保存为CVS时出现Spark 2.0 DataSourceRegister配置错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Spark 2.0 Scala 2.11(从Spark 1.6迁移代码的过程)中将数据帧保存到cvs中.

I'm trying to save a data frame to cvs in in Spark 2.0, Scala 2.11 (process of migrating code from Spark 1.6).

sparkSession.sql("SELECT * FROM myTable").
      coalesce(1).
      write.
      format("com.databricks.spark.csv").
      option("header","true").
      save(config.resultLayer)

spark会话是否正确构建?

Is the spark session built correctly?

implicit val sparkSession = SparkSession.builder
  .master("local")
  .appName("com.yo.go")
  .enableHiveSupport()
  .getOrCreate()

仅在运行时收到错误(代码编译).

The error is received only at runtime (code compiles).

Exception in thread "main" java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.hive.orc.DefaultSource could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:224)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
    at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
    at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
    at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
    at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:126)
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:78)
    at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:78)
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:427)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
    at com.apple.geo.contigu.common.JoinFeatures$.savePairSummaries(JoinFeatures.scala:343)
    at com.apple.geo.contigu.Main$.main(Main.scala:32)
    at com.apple.geo.contigu.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.VerifyError: Bad return type
Exception Details:
  Location:
    org/apache/spark/sql/hive/orc/DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;[Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Map;)Lorg/apache/spark/sql/sources/HadoopFsRelation; @35: areturn
  Reason:
    Type 'org/apache/spark/sql/hive/orc/OrcRelation' (current frame, stack[0]) is not assignable to 'org/apache/spark/sql/sources/HadoopFsRelation' (from method signature)
  Current Frame:
    bci: @35
    flags: { }
    locals: { 'org/apache/spark/sql/hive/orc/DefaultSource', 'org/apache/spark/sql/SQLContext', '[Ljava/lang/String;', 'scala/Option', 'scala/Option', 'scala/collection/immutable/Map' }
    stack: { 'org/apache/spark/sql/hive/orc/OrcRelation' }
  Bytecode:
    0000000: b200 1c2b c100 1ebb 000e 592a b700 22b6
    0000010: 0026 bb00 2859 2c2d b200 2d19 0419 052b
    0000020: b700 30b0                              

    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
    at java.lang.Class.getConstructor0(Class.java:2895)
    at java.lang.Class.newInstance(Class.java:354)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
    ... 27 more

有什么明显的我忽略了的东西吗?需要更多细节吗?任何建议表示赞赏.谢谢!

Is there something obvious that I've overlooked? Need more details? Any advice is appreciated. Thanks!

推荐答案

遇到了大致相同的情况.

ran into roughly same situation.

如果您只想使用本地计算机而不是完全配置的群集进行快速运行.

if you just want to do a quick run using local computer instead of fully configured cluster.

  • 关闭启用配置单元
  • 在pom.xml中的
  • 确保相关依赖项是< scope>提供的</scope>

这篇关于将DataFrame保存为CVS时出现Spark 2.0 DataSourceRegister配置错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆