三角洲湖泊-在pyspark中插入sql失败,并出现java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias [英] delta lake - Insert into sql in pyspark is failing with java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias

查看:249
本文介绍了三角洲湖泊-在pyspark中插入sql失败,并出现java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Dataproc集群是使用带有io包 io.delta:delta-core_2.12:0.7.0

的图像 2.0.x 创建的

Spark版本为3.1.1

Spark shell以开头:

  pyspark --conf"spark.sql.extensions = io.delta.sql.DeltaSparkSessionExtension";\--conf spark.sql.catalog.spark_catalog = org.apache.spark.sql.delta.catalog.DeltaCatalog 

执行命令以创建增量表并将其插入增量SQL:

  spark.sql(如果不存在,请创建表"(c_id Long,c_name字符串,c_city字符串)使用DELTA位置'gs://edw-bi-dev-dataexports/delta-table-poc/dt_poc/customer'(")spark.sql("INSERT INTO customer VALUES(1,'Shawn','Tx')") 

错误:

  Traceback(最近一次通话最近):在< module>中的文件< stdin>"第1行.sql中的文件"/usr/lib/spark/python/pyspark/sql/session.py",第719行返回DataFrame(self._jsparkSession.sql(sqlQuery),self._wrapped)在__call__中的文件"/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py"中,第1305行装饰中的文件"/usr/lib/spark/python/pyspark/sql/utils.py",第111行返回f(* a,** kw)在get_return_value中的第328行中的文件"/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py"py4j.protocol.Py4JJavaError:调用o58.sql时发生错误.:java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias.< init>(Lorg/apache/spark/sql/catalyst/expressions/Expression; Ljava/lang/String; Lorg/apache/spark/sql/catalyst/expressions/ExprId; Lscala/collection/Seq; Lscala/Option;)V在org.apache.spark.sql.delta.DeltaAnalysis.$ anonfun $ normalizeQueryColumns $ 1(DeltaAnalysis.scala:162)在scala.collection.immutable.List.map(List.scala:293)在org.apache.spark.sql.delta.DeltaAnalysis.org $ apache $ spark $ sql $ delta $ DeltaAnalysis $$ normalizeQueryColumns(DeltaAnalysis.scala:151)在org.apache.spark.sql.delta.DeltaAnalysis $$ anonfun $ apply $ 1.applyOrElse(DeltaAnalysis.scala:49)在org.apache.spark.sql.delta.DeltaAnalysis $$ anonfun $ apply $ 1.applyOrElse(DeltaAnalysis.scala:45)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$ anonfun $ resolveOperatorsDown $ 2(AnalysisHelper.scala:108)中在org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:73)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$ anonfun $ resolveOperatorsDown $ 1(AnalysisHelper.scala:108)中在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper $ .allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown $(AnalysisHelper.scala:104)在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)在org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:45)在org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:40)在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ execute $ 2(RuleExecutor.scala:216)中在scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)在scala.collection.LinearSeqOptimized.foldLeft $(LinearSeqOptimized.scala:122)在scala.collection.immutable.List.foldLeft(List.scala:91)在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ execute $ 1(RuleExecutor.scala:213)中在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ execute $ 1 $ adapted(RuleExecutor.scala:205)在scala.collection.immutable.List.foreach(List.scala:431)在org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)在org.apache.spark.sql.catalyst.analysis.Analyzer.org上$ apache $ spark $ sql $ catalyst $ analysis $ Analyzer $$ executeSameContext(Analyzer.scala:195)在org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:189)在org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:154)在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ executeAndTrack $ 1(RuleExecutor.scala:183)中在org.apache.spark.sql.catalyst.QueryPlanningTracker $ .withTracker(QueryPlanningTracker.scala:88)在org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:183)在org.apache.spark.sql.catalyst.analysis.Analyzer.$ anonfun $ executeAndCheck $ 1(Analyzer.scala:173)中在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper $ .markInAnalyzer(AnalysisHelper.scala:228)在org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:172)在org.apache.spark.sql.execution.QueryExecution.$ anonfun $ analyzed $ 1(QueryExecution.scala:73)在org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)位于org.apache.spark.sql.execution.QueryExecution.$ anonfun $ executePhase $ 1(QueryExecution.scala:143)在org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)在org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)在org.apache.spark.sql.execution.QueryExecution.analyzed $ lzycompute(QueryExecution.scala:73)在org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)在org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)位于org.apache.spark.sql.Dataset $.$ anonfun $ ofRows $ 2(Dataset.scala:98)在org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)在org.apache.spark.sql.Dataset $ .ofRows(Dataset.scala:96)在org.apache.spark.sql.SparkSession.$ anonfun $ sql $ 1(SparkSession.scala:615)在org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)在org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)在sun.reflect.GeneratedMethodAccessor118.invoke(未知来源)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)在java.lang.reflect.Method.invoke(Method.java:498)在py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)在py4j.Gateway.invoke(Gateway.java:282)在py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)在py4j.commands.CallCommand.execute(CallCommand.java:79)在py4j.GatewayConnection.run(GatewayConnection.java:238) 

在这里我无法找出问题的根本原因.

pyspark --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Command executed to create delta table and insert into delta sql's:

spark.sql("""CREATE TABLE IF NOT EXISTS customer(
             c_id Long, c_name String, c_city String
             )
           USING DELTA LOCATION 'gs://edw-bi-dev-dataexports/delta-table-poc/dt_poc/customer'
         """)

spark.sql("INSERT INTO customer VALUES(1, 'Shawn', 'Tx')")

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 719, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o58.sql.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias.<init>(Lorg/apache/spark/sql/catalyst/expressions/Expression;Ljava/lang/String;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;Lscala/Option;)V
        at org.apache.spark.sql.delta.DeltaAnalysis.$anonfun$normalizeQueryColumns$1(DeltaAnalysis.scala:162)
        at scala.collection.immutable.List.map(List.scala:293)
        at org.apache.spark.sql.delta.DeltaAnalysis.org$apache$spark$sql$delta$DeltaAnalysis$$normalizeQueryColumns(DeltaAnalysis.scala:151)
        at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:49)
        at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:45)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
        at org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:45)
        at org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:40)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)
        at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
        at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
        at scala.collection.immutable.List.foldLeft(List.scala:91)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:195)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:189)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:154)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:183)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:183)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:173)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:172)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
        at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)

I am not able to figure out the root cause for the problem here.

解决方案

It's caused by this change that broke the binary compatibility for the Alias case class. The fix for that either downgrade the Spark version to 3.0.x, or wait until new Delta version is released with support for 3.1.x.

P.S. There are other places in Delta that were broken by changes in the Spark 3.1.1

这篇关于三角洲湖泊-在pyspark中插入sql失败,并出现java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆