无法将表格保存到Hive Metastore,HDP 3.0 [英] Cant save table to hive metastore, HDP 3.0

查看:54
本文介绍了无法将表格保存到Hive Metastore,HDP 3.0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法再使用metastore将表保存到配置单元数据库.我使用 spark.sql 看到了spark中的表,但是在蜂巢数据库中看不到相同的表.我试过了,但它没有将表格存储为配置单元.如何配置Hive Metastore?spark版本是2.3.1.

I cant save a table to hive database anymore using metastore. I see the tables in spark using spark.sql but I cant see the same tables in hive database. I tried this but it doesnt store the table to hive. How can I configure the hive metastore? The spark version is 2.3.1.

如果您想了解更多详细信息,请发表评论.

If you want more details please comment.

%spark
import org.apache.spark.sql.SparkSession
val spark = (SparkSession
        .builder
        .appName("interfacing spark sql to hive metastore without configuration file")
        .config("hive.metastore.uris", "thrift://xxxxxx.xxx:9083") // replace with your hivemetastore service's thrift url
        .enableHiveSupport() // don't forget to enable hive support
        .getOrCreate())

spark.conf.get("spark.sql.warehouse.dir")// Output: res2: String = /apps/spark/warehouse
spark.conf.get("hive.metastore.warehouse.dir")// NotSuchElement Exception
spark.conf.get("spark.hadoop.hive.metastore.uris")// NotSuchElement Exception

var df = (spark
        .read
        .format("parquet")
        .load(dataPath)

df.createOrReplaceTempView("my_temp_table");
spark.sql("drop table if exists my_table");
spark.sql("create table my_table using hive as select * from my_temp_table");
spark.sql("show tables").show(false)// I see my_table in default database

在@catpaws答案之后更新:HDP 3.0及更高版本,Hive和Spark使用独立的目录

Update after @catpaws answer: HDP 3.0 and later, Hive and Spark use independent catalogues

将表格保存到Spark目录:

df.createOrReplaceTempView("my_temp_table");
spark.sql("create table my_table as select * from my_temp_table");

VS

将表保存到配置单元目录:

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.createTable("newTable")
  .ifNotExists()
  .column("ws_sold_time_sk", "bigint")
  ...// x 200 columns
  .column("ws_ship_date_sk", "bigint")
  .create()

df.write.format(HIVE_WAREHOUSE_CONNECTOR)
  .option("table", "newTable")
  .save()

如您所见,Hive Warehouse Connector对于具有一百列的数据帧是非常不切实际的.有什么方法可以将大型数据帧保存到Hive?

As you see in this way Hive Warehouse Connector is very impractical for dataframes with hundred columns. Is there any way to save large dataframes to Hive?

推荐答案

正如@catpaws所说,Spark和Hive使用独立的目录.要使用Hive Warehouse Connector保存具有多列的数据框,可以使用我的功能:

As @catpaws said Spark and Hive use independent catalogues. To save dataframe with multiple columns with Hive Warehouse Connector you can use my function:

save_table_hwc(df1, "default", "table_test1")

def save_table_hwc(df: DataFrame, database: String, tableName: String) : Unit = {
    hive.setDatabase(database)
    hive.dropTable(tableName, true, false)
    hive.createTable(tableName)
    var table_builder = hive.createTable(tableName)
    for( i <- 0 to df.schema.length-1){
        var name = df.schema.toList(i).name.replaceAll("[^\\p{L}\\p{Nd}]+", "")
        var data_type = df.schema.toList(i).dataType.sql
        table_builder = table_builder.column(name, data_type)
    }
    table_builder.create()
    df.write.format(HIVE_WAREHOUSE_CONNECTOR).option("table", tableName).save()
}

这篇关于无法将表格保存到Hive Metastore,HDP 3.0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆