在Spark中将Dataframe存储到配置单元分区表 [英] storing a Dataframe to a hive partition table in spark

查看：756 发布时间：2018/5/31 20:05:57 hadoop hive spark-streaming

本文介绍了在Spark中将Dataframe存储到配置单元分区表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图将来自kafka主题的数据流存储到配置单元分区表中。我能够将dstream转换为数据框并创建一个配置单元上下文。我的代码看起来像这样

val hiveContext = new HiveContext（sc） hiveContext.setConf（hive.exec。 dynamic.partition，true） hiveContext.setConf（hive.exec.dynamic.partition.mode，nonstrict） newdf.registerTempTable（temp）// newdf是我的dataframe newdf.write.mode（SaveMode.Append）.format（osv）。partitionBy（date）。saveAsTable（mytablename）

 线程main中的异常java.lang.IllegalArgumentException：错误的FS：file：/ tmp / spark-3f00838b-c5d9-4a9a-9818-11fbb0007076 / scratch_hive_2016-10-18_23-18-33_118_769650074381029645-1，预计：hdfs：//

当我尝试将它保存为普通表并注释掉它的配置时，但是，使用分区表...它给了我这个错误。我试着将数据框注册为临时表，然后将该表写入分区表。这样做也给了我同样的错误

有人可以告诉我该如何解决它。
Thanks。

解决方案

我想通了。
在spark应用程序的代码中，我声明了scratch dir位置，如下所示。

  sqlContext.sql （SET hive.exec.scratchdir =< hdfs location>）

I'm trying to store a stream of data comming in from a kafka topic into a hive partition table. I was able to convert the dstream to a dataframe and created a hive context. My code looks like this

val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
newdf.registerTempTable("temp") //newdf is my dataframe
newdf.write.mode(SaveMode.Append).format("osv").partitionBy("date").saveAsTable("mytablename")

But when I deploy the app on cluster, its says

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark-3f00838b-c5d9-4a9a-9818-11fbb0007076/scratch_hive_2016-10-18_23-18-33_118_769650074381029645-1, expected: hdfs://

When I try to save it as a normal table and comment out the hiveconfigurations it work. But, with partition table...its giving me this error.

I also tried registering the dataframe as a temp table and then to write that table to the partition table. Doing that also gave me the same error

Can someone please tell how can I solve it. Thanks.

解决方案

I figured it out. In the code for spark app, I declared the scratch dir location as below and it worked.

sqlContext.sql("SET hive.exec.scratchdir=<hdfs location>")

这篇关于在Spark中将Dataframe存储到配置单元分区表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Spark中将Dataframe存储到配置单元分区表 [英] storing a Dataframe to a hive partition table in spark

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

在Spark中将Dataframe存储到配置单元分区表 [英] storing a Dataframe to a hive partition table in spark

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭