如何将Spark DataFrame插入Hive内部表? [英] How to insert Spark DataFrame to Hive Internal table?

查看:434
本文介绍了如何将Spark DataFrame插入Hive内部表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在追加模式下将DF插入Hive内部表的正确方法是什么.看来我们可以使用"saveAsTable"方法直接将DF写入Hive或将DF存储到temp表中,然后使用查询.

What's the right way to insert DF to Hive Internal table in Append Mode. It seems we can directly write the DF to Hive using "saveAsTable" method OR store the DF to temp table then use the query.

df.write().mode("append").saveAsTable("tableName")

OR

df.registerTempTable("temptable") 
sqlContext.sql("CREATE TABLE IF NOT EXISTS mytable as select * from temptable")

第二种方法是追加记录还是覆盖记录?

Will the second approach append the records or overwrite it?

还有其他方法可以有效地将DF写入Hive内部表吗?

Is there any other way to effectively write the DF to Hive Internal table?

推荐答案

df.saveAsTable("tableName", "append")已弃用.相反,您应该使用第二种方法.

df.saveAsTable("tableName", "append") is deprecated. Instead you should the second approach.

sqlContext.sql("CREATE TABLE IF NOT EXISTS mytable as select * from temptable")

如果表不存在,它将创建表.当您第二次运行代码时,您需要删除现有表,否则您的代码将异常退出.

It will create table if the table doesnot exist. When you will run your code second time you need to drop the existing table otherwise your code will exit with exception.

另一种方法,如果您不想删除表. 单独创建一个表,然后将数据插入该表.

Another approach, If you don't want to drop table. Create a table separately, then insert your data into that table.

下面的代码会将数据追加到现有表中

The below code will append data into existing table

sqlContext.sql("insert into table mytable select * from temptable")

下面的代码会将数据覆盖到现有表中

And the below code will overwrite the data into existing table

sqlContext.sql("insert overwrite table mytable select * from temptable")

此答案基于Spark 1.6.2.如果您使用的是其他版本的Spark,我建议您检查相应的文档.

This answer is based on Spark 1.6.2. In case you are using other version of Spark I would suggests to check the appropriate documentation.

这篇关于如何将Spark DataFrame插入Hive内部表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆