Spark作业由于java.lang.NoSuchMethodException而失败:org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions [英] Spark job fails due to java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions

查看:4110
本文介绍了Spark作业由于java.lang.NoSuchMethodException而失败:org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



  16/11通过spark-submit运行spark工作时遇到问题/ 16 11:41:12错误yarn.ApplicationMaster:用户类抛出异常:java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path,java .lang.String,java.util.Map,boolean,int,boolean,boolean,boolean)
java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache .hadoop.fs.Path,java.lang.String,java.util.Map,boolean,int,boolean,boolean,boolean)$ b $ java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod $ lzycompute(HiveShim。 scala:404)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)
at org.apache.spark.sql.hive.client.Shim_v0_14。 loadDynamicPartitio ns(HiveShim.scala:455)
at org.apache.spark.sql.hive.client.ClientWrapper $$ anonfun $ loadDynamicPartitions $ 1.apply $ mcV $ sp(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper $$ anonfun $ loadDynamicPartitions $ 1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper $$ anonfun $ loadDynamicPartitions $ 1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper $$ anonfun $ withHiveState $ 1.apply(ClientWrapper.scala:281)
at org .apache.spark.sql.hive.client.ClientWrapper.liftedTree1 $ 1(ClientWrapper.scala:228)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)
...

我使用spark 1.6.0,scala 2.10,hive 1.1.0,并且该平台是一个CDH 5.7.1,具有Spark和Hive的相同版本。
在类路径上传递给spark作业的hive-exec是hive-exec-1.1.0-cdh5.7.1.jar。这个jar有一个类 org.apache.hadoop.hive.ql.metadata.Hive ,我可以看到它有以下方法:

  public java.util.Map< java.util.Map< java.lang.String,java.lang.String>,org.apache.hadoop.hive.ql。 metadata.Partition> loadDynamicPartitions(org.apache.hadoop.fs.Path,java.lang.String,java.util.Map< java.lang.String,java.lang.String> ;, boolean,int,boolean,boolean,boolean)throws org。 apache.hadoop.hive.ql.metadata.HiveException; 

这与 org.apache.spark上的不一样.sql.hive.client.ClientWrapper 类与我正在使用的库spark-hive_2.10-1.6.0.jar一起提供,此类中相同方法的签名使用类 org.apache.spark.sql.hive.client.HiveShim 使用此方法:

 <$ c $ b $ privateDayload val loadDynamicPartitionsMethod = 
findMethod(
classOf [Hive],
loadDynamicPartitions,
classOf [Path],
classOf [String],
classOf [JMap [String,String]],
JBoolean.TYPE,
JInteger.TYPE,
JBoolean.TYPE,
JBoolean.TYPE)

我也检查了hive-exec jar的历史记录,并且似乎类 org的签名.apache.hadoop.hive.ql.metadata.Hive 在版本1.0.0后发生了变化。
我是Spark新手,但在我看来,spark-hive库使用Hive的旧实现(我可以在jar文件中的META-INF / DEPENDENCIES文件中看到,它声明了对org.spark- project.hive:蜂房EXEC:罐子:1.2.1.spark)。
有没有人知道如何设置Spark作业来使用正确的配置单元库?

解决方案

确保已经设置下面的设置

  SET hive.exec.dynamic.partition = true; 
SET hive.exec.max.dynamic.partitions = 2048
SET hive.exec.dynamic.partition.mode = nonstrict;

在Spark中,您可以在hive上设置如下

  hiveCtx.setConf(hive.exec.dynamic.partition,true)
hiveCtx.setConf(hive.exec.max.dynamic.partitions ,2048)
hiveCtx.setConf(hive.exec.dynamic.partition.mode,nonstrict)

如果问题仍然存在,我想这意味着spark版本所使用的内容与您尝试运行spark-submit的环境不匹配...您可以尝试运行程序spark-shell,如果它能正常工作,则尝试将spark版本与环境设置对齐。



您可以像下面那样设置对sbt的依赖关系或者pom

  libraryDependencies + =org.apache.spark%spark-core_2.10%1.6.3
libraryDependencies + = org.apache.spark%spark-sql_2.10%1.6.3
libraryDependencies + =org.apache.spark%spark-hive_2.10%1.6.3
libraryDependencies + =org.apache.hive%hive-exec% 1.1.0

请参阅
https://mvnrepository.com/artifact/org.apache.spark



您可以使用以下命令获得环境设置:
SPARK_PRINT_LAUNCH_COMMAND = true spark-shell



另一种方法是使用spark partition保存数据


$ b $ pre $ dataframe.write.mode(overwrite)。partitionBy(col1,col2)。json(// path)


I am having a problem to run an spark job via spark-submit due to the following error:

16/11/16 11:41:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)
java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod$lzycompute(HiveShim.scala:404)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitions(HiveShim.scala:455)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:281)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)
...

I am using spark 1.6.0 with scala 2.10, hive 1.1.0 and the platform is a CDH 5.7.1 with the same versions for spark and hive. The hive-exec that is passed on the classpath to the spark job is hive-exec-1.1.0-cdh5.7.1.jar. This jar has a class org.apache.hadoop.hive.ql.metadata.Hive which I can see has the following method:

public java.util.Map<java.util.Map<java.lang.String, java.lang.String>, org.apache.hadoop.hive.ql.metadata.Partition> loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map<java.lang.String, java.lang.String>, boolean, int, boolean, boolean, boolean) throws org.apache.hadoop.hive.ql.metadata.HiveException;

Which is not the same that the one on the org.apache.spark.sql.hive.client.ClientWrapper class shipped with the library spark-hive_2.10-1.6.0.jar that I am using, the signature of the same method in this class is using the class org.apache.spark.sql.hive.client.HiveShim with this method:

private lazy val loadDynamicPartitionsMethod =
findMethod(
  classOf[Hive],
  "loadDynamicPartitions",
  classOf[Path],
  classOf[String],
  classOf[JMap[String, String]],
  JBoolean.TYPE,
  JInteger.TYPE,
  JBoolean.TYPE,
  JBoolean.TYPE)

I also checked the history of the hive-exec jar and seems that the signature of the class org.apache.hadoop.hive.ql.metadata.Hive was changed after the version 1.0.0. I am new to Spark but it seems to me that the spark-hive library uses an old implementation of Hive (I can see in the META-INF/DEPENDENCIES file inside the jar has declared a dependency on org.spark-project.hive:hive-exec:jar:1.2.1.spark). Does anyone knows how to set the spark job to use the proper hive library?

解决方案

Make sure you have set the below setting

SET hive.exec.dynamic.partition=true; 
SET hive.exec.max.dynamic.partitions=2048
SET hive.exec.dynamic.partition.mode=nonstrict;

In Spark you can set on hive Context as below

hiveCtx.setConf("hive.exec.dynamic.partition","true")
hiveCtx.setConf("hive.exec.max.dynamic.partitions","2048")
hiveCtx.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

If problem still exists i guess so means spark version what you are using doesn't match with the environment where you are trying to run your spark-submit...You can try to run your program in spark-shell and if it works then try to align spark version with the environment setting.

You can set the dependency on you sbt as below or pom

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.3"
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.3"
libraryDependencies += "org.apache.spark" % "spark-hive_2.10" % "1.6.3"
libraryDependencies += "org.apache.hive" % "hive-exec" % "1.1.0"

Please refer https://mvnrepository.com/artifact/org.apache.spark

You can get environment setting by using below command SPARK_PRINT_LAUNCH_COMMAND=true spark-shell

Alternative approach is to use spark partition by to save data

    dataframe.write.mode("overwrite").partitionBy("col1", "col2").json("//path")

这篇关于Spark作业由于java.lang.NoSuchMethodException而失败:org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆