三角洲湖泊-在pyspark中插入sql失败,并出现java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias [英] delta lake - Insert into sql in pyspark is failing with java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias
本文介绍了三角洲湖泊-在pyspark中插入sql失败,并出现java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Dataproc集群是使用带有io包 io.delta:delta-core_2.12:0.7.0
2.0.x
创建的Spark版本为3.1.1
Spark shell以开头:
pyspark --conf"spark.sql.extensions = io.delta.sql.DeltaSparkSessionExtension";\--conf spark.sql.catalog.spark_catalog = org.apache.spark.sql.delta.catalog.DeltaCatalog
执行命令以创建增量表并将其插入增量SQL:
spark.sql(如果不存在,请创建表"(c_id Long,c_name字符串,c_city字符串)使用DELTA位置'gs://edw-bi-dev-dataexports/delta-table-poc/dt_poc/customer'(")spark.sql("INSERT INTO customer VALUES(1,'Shawn','Tx')")
错误:
Traceback(最近一次通话最近):在< module>中的文件< stdin>"第1行.sql中的文件"/usr/lib/spark/python/pyspark/sql/session.py",第719行返回DataFrame(self._jsparkSession.sql(sqlQuery),self._wrapped)在__call__中的文件"/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py"中,第1305行装饰中的文件"/usr/lib/spark/python/pyspark/sql/utils.py",第111行返回f(* a,** kw)在get_return_value中的第328行中的文件"/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py"py4j.protocol.Py4JJavaError:调用o58.sql时发生错误.:java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias.< init>(Lorg/apache/spark/sql/catalyst/expressions/Expression; Ljava/lang/String; Lorg/apache/spark/sql/catalyst/expressions/ExprId; Lscala/collection/Seq; Lscala/Option;)V在org.apache.spark.sql.delta.DeltaAnalysis.$ anonfun $ normalizeQueryColumns $ 1(DeltaAnalysis.scala:162)在scala.collection.immutable.List.map(List.scala:293)在org.apache.spark.sql.delta.DeltaAnalysis.org $ apache $ spark $ sql $ delta $ DeltaAnalysis $$ normalizeQueryColumns(DeltaAnalysis.scala:151)在org.apache.spark.sql.delta.DeltaAnalysis $$ anonfun $ apply $ 1.applyOrElse(DeltaAnalysis.scala:49)在org.apache.spark.sql.delta.DeltaAnalysis $$ anonfun $ apply $ 1.applyOrElse(DeltaAnalysis.scala:45)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$ anonfun $ resolveOperatorsDown $ 2(AnalysisHelper.scala:108)中在org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:73)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$ anonfun $ resolveOperatorsDown $ 1(AnalysisHelper.scala:108)中在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper $ .allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown $(AnalysisHelper.scala:104)在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)在org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:45)在org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:40)在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ execute $ 2(RuleExecutor.scala:216)中在scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)在scala.collection.LinearSeqOptimized.foldLeft $(LinearSeqOptimized.scala:122)在scala.collection.immutable.List.foldLeft(List.scala:91)在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ execute $ 1(RuleExecutor.scala:213)中在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ execute $ 1 $ adapted(RuleExecutor.scala:205)在scala.collection.immutable.List.foreach(List.scala:431)在org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)在org.apache.spark.sql.catalyst.analysis.Analyzer.org上$ apache $ spark $ sql $ catalyst $ analysis $ Analyzer $$ executeSameContext(Analyzer.scala:195)在org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:189)在org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:154)在org.apache.spark.sql.catalyst.rules.RuleExecutor.$ anonfun $ executeAndTrack $ 1(RuleExecutor.scala:183)中在org.apache.spark.sql.catalyst.QueryPlanningTracker $ .withTracker(QueryPlanningTracker.scala:88)在org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:183)在org.apache.spark.sql.catalyst.analysis.Analyzer.$ anonfun $ executeAndCheck $ 1(Analyzer.scala:173)中在org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper $ .markInAnalyzer(AnalysisHelper.scala:228)在org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:172)在org.apache.spark.sql.execution.QueryExecution.$ anonfun $ analyzed $ 1(QueryExecution.scala:73)在org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)位于org.apache.spark.sql.execution.QueryExecution.$ anonfun $ executePhase $ 1(QueryExecution.scala:143)在org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)在org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)在org.apache.spark.sql.execution.QueryExecution.analyzed $ lzycompute(QueryExecution.scala:73)在org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)在org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)位于org.apache.spark.sql.Dataset $.$ anonfun $ ofRows $ 2(Dataset.scala:98)在org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)在org.apache.spark.sql.Dataset $ .ofRows(Dataset.scala:96)在org.apache.spark.sql.SparkSession.$ anonfun $ sql $ 1(SparkSession.scala:615)在org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)在org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)在sun.reflect.GeneratedMethodAccessor118.invoke(未知来源)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)在java.lang.reflect.Method.invoke(Method.java:498)在py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)在py4j.Gateway.invoke(Gateway.java:282)在py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)在py4j.commands.CallCommand.execute(CallCommand.java:79)在py4j.GatewayConnection.run(GatewayConnection.java:238)
在这里我无法找出问题的根本原因.
pyspark --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
Command executed to create delta table and insert into delta sql's:
spark.sql("""CREATE TABLE IF NOT EXISTS customer(
c_id Long, c_name String, c_city String
)
USING DELTA LOCATION 'gs://edw-bi-dev-dataexports/delta-table-poc/dt_poc/customer'
""")
spark.sql("INSERT INTO customer VALUES(1, 'Shawn', 'Tx')")
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/session.py", line 719, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o58.sql.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.Alias.<init>(Lorg/apache/spark/sql/catalyst/expressions/Expression;Ljava/lang/String;Lorg/apache/spark/sql/catalyst/expressions/ExprId;Lscala/collection/Seq;Lscala/Option;)V
at org.apache.spark.sql.delta.DeltaAnalysis.$anonfun$normalizeQueryColumns$1(DeltaAnalysis.scala:162)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.delta.DeltaAnalysis.org$apache$spark$sql$delta$DeltaAnalysis$$normalizeQueryColumns(DeltaAnalysis.scala:151)
at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:49)
at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:45)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
at org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:45)
at org.apache.spark.sql.delta.DeltaAnalysis.apply(DeltaAnalysis.scala:40)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:195)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:189)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:154)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:183)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:183)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:173)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:172)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
I am not able to figure out the root cause for the problem here.
解决方案
It's caused by this change that broke the binary compatibility for the Alias
case class. The fix for that either downgrade the Spark version to 3.0.x, or wait until new Delta version is released with support for 3.1.x.
P.S. There are other places in Delta that were broken by changes in the Spark 3.1.1
这篇关于三角洲湖泊-在pyspark中插入sql失败,并出现java.lang.NoSuchMethodError:org.apache.spark.sql.catalyst.expressions.Alias的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文