尝试将数据保存到数据框的Hive表时出错 [英] Error while trying to save the data to Hive tables from dataframe

查看:385
本文介绍了尝试将数据保存到数据框的Hive表时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


由于阶段失败导致作业中止:任务5中阶段65.0失败4
次,最近失败:阶段65.0中丢失的任务5.3(TID 987,
tnblf585.test.sprint.com):java.lang.ArrayIndexOutOfBoundsException:
45 at
org.apache.spark.sql.catalyst.expressions.GenericMutableRow.genericGet(rows.scala:254)
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow $ class。 getAs(rows.scala:35)
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow $ class.isNullAt(rows.scala:36)
at
org.apache.spark.sql.catalyst.expressions.GenericMutableRow.isNullAt(rows.scala:248)
at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable $$ anonfun $ org $ apache $ spark $ sql $ hive $ execution $ InsertIntoHiveTable $$ writeToFile $ 1 $ 1.apply(InsertIntoHiveTable.scala:107)
at
org.apache.spark.sql.h ive.execution.InsertIntoHiveTable $$ anonfun $ org $ apache $ spark $ sql $ hive $ execution $ InsertIntoHiveTable $$ writeToFile $ 1 $ 1.apply(InsertIntoHiveTable.scala:104)
at scala.collection.Iterator $ class.foreach (Iterator.scala:727)at
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org $ apache $ spark $ sql $ hive $ execution $ InsertIntoHiveTable $$ writeToFile $ 1(InsertIntoHiveTable.scala:104)
at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable $$ anonfun $ saveAsHiveFile $ 3.apply( InsertIntoHiveTable.scala:84)
at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable $$ anonfun $ saveAsHiveFile $ 3.apply(InsertIntoHiveTable.scala:84)
at org。 apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)at
org.apache.spark。 executor.Executor $ TaskRunner.run(Executor.scala:227)
at
java.util.con current.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)$ b $ at java.lang.Thread。运行(Thread.java:745)


驱动程序堆栈跟踪:

解决方案

我已经发现,dataframe和hive表中的一列名称不一样,在正确加载列名纠正后


We have the following issue when we try to insert data into Hive table.

Job aborted due to stage failure: Task 5 in stage 65.0 failed 4 times, most recent failure: Lost task 5.3 in stage 65.0 (TID 987, tnblf585.test.sprint.com): java.lang.ArrayIndexOutOfBoundsException: 45 at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.genericGet(rows.scala:254) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36) at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.isNullAt(rows.scala:248) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:107) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:104) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:

解决方案

I have figured it out that one of the column name in dataframe and hive table are not same , after the column name correction it has loaded correctly

这篇关于尝试将数据保存到数据框的Hive表时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆