Zeppelin + Spark:从S3读取Parquet会引发NoSuchMethodError:com.fasterxml.jackson [英] Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

查看:104
本文介绍了Zeppelin + Spark:从S3读取Parquet会引发NoSuchMethodError:com.fasterxml.jackson的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用主要下载中的Zeppelin 0.7.2二进制文件和带有Hadoop 2.6的Spark 2.1.0(以下段落):

Using Zeppelin 0.7.2 binaries from the main download, and Spark 2.1.0 w/ Hadoop 2.6, the following paragraph:

val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("")

产生以下内容:

java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
  at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
  at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
  at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
  at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
  at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
  at org.apache.spark.SparkContext.parallelize(SparkContext.scala:715)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
  at scala.Option.orElse(Option.scala:289)
  at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
  ... 47 elided

仅在齐柏林飞艇中,此错误在普通的火花壳中不会发生.我尝试了以下修复程序,但无济于事:

This error does not happen in the normal spark-shell, only in Zeppelin. I have attempted the following fixes, which do nothing:

  • 将杰克逊2.6.2罐下载到Zeppelin lib文件夹并重新启动
  • 将Maven存储库中的jackson 2.9依赖项添加到解释器设置中
  • 从Zeppelin lib文件夹中删除杰克逊罐子

谷歌搜索没有出现类似情况.请随时询问更多信息或提出建议.谢谢!

Googling is turning up no similar situations. Please don't hesitate to ask for more information, or make suggestions. Thanks!

推荐答案

我遇到了同样的问题.我添加了com.amazonaws:aws-java-sdkorg.apache.hadoop:hadoop-aws作为Spark解释器的依赖项.这些依赖项带来了它们自己的com.fasterxml.jackson.core:*版本,并与Spark的版本冲突.

I had the same problem. I added com.amazonaws:aws-java-sdk and org.apache.hadoop:hadoop-aws as dependencies for the Spark interpreter. These dependencies bring in their own versions of com.fasterxml.jackson.core:* and conflict with Spark's.

您还必须将com.fasterxml.jackson.core:*从其他依赖项中排除,这是${ZEPPELIN_HOME}/conf/interpreter.json Spark解释器依赖关系部分的示例:

You also must exclude com.fasterxml.jackson.core:* from other dependencies, this is an example ${ZEPPELIN_HOME}/conf/interpreter.json Spark interpreter depenency section:

"dependencies": [ { "groupArtifactVersion": "com.amazonaws:aws-java-sdk:1.7.4", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] }, { "groupArtifactVersion": "org.apache.hadoop:hadoop-aws:2.7.1", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] } ]

"dependencies": [ { "groupArtifactVersion": "com.amazonaws:aws-java-sdk:1.7.4", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] }, { "groupArtifactVersion": "org.apache.hadoop:hadoop-aws:2.7.1", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] } ]

这篇关于Zeppelin + Spark:从S3读取Parquet会引发NoSuchMethodError:com.fasterxml.jackson的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆