在scala/spark代码中不允许在配置单元中添加列 [英] Add column in hive not allowed from scala/spark code

查看:127
本文介绍了在scala/spark代码中不允许在配置单元中添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果源数据有新列,我试图在Hive表中添加一列.所有对新列的检测都可以正常工作,但是,当我尝试将列添加到目标表中时,会出现此错误:

I am trying to add a column in a Hive table if the source data has new columns. All the detection of new columns works well, however, when I try to add the column to the destination table, I receive this error:

for (f <- df.schema.fields) {
  if ("[" + f.name + "]"==chk) {
    spark.sqlContext.sql("alter table dbo_nwd_orders add columns (" + f.name + " " + f.dataType.typeName.replace("integer", "int") + ")")
  }
}

错误:

WARN HiveExternalCatalog: Could not alter schema of table  `default`.`dbo_nwd_orders` in a Hive compatible way. Updating Hive metastore in Spark SQL specific format
InvalidOperationException(message:partition keys can not be changed.)

但是,如果我捕获到生成的alter句子并从hive GUI(HUE)执行它,则可以毫无问题地添加它.

However, if I catch the alter sentence generated and execute it from hive GUI (HUE), I can add it without issues.

alter table dbo_nwd_orders add columns (newCol int)

为什么该句子在GUI中有效,而在Spark代码中无效?

Why that sentence is valid from the GUI and not from spark code?

非常感谢您.

推荐答案

在这里已经多次说过,但只是要重申-Spark不是Hive界面,也不是为实现语言的完全Hive兼容性而设计的(Spark目标SQL标准,Hive使用类似于SQL的自定义查询语言)或功能(Spark是ETL解决方案,Hive是数据仓库解决方案).

It has been said multiple times here, but just to reiterate - Spark is not Hive interface and is not designed for full Hive compatibility in terms of language (Spark targets SQL standard, Hive uses custom SQL-like query language) or capabilities (Spark is ETL solution, Hive is a Data Warehousing solution).

这两个数据布局之间也不完全兼容.

Even data layouts are not fully compatible between these two.

具有Hive支持的Spark是可访问Hive Metastore的Spark,而不是行为类似于Hive的Spark.

Spark with Hive support is Spark with access to Hive metastore, not Spark that behaves like Hive.

如果您需要访问Hive的全部功能,请使用本机客户端或本机(而非Spark)JDBC连接直接连接到Hive,然后从那里进行交互.

If you need to access full set of Hive's features connect to Hive directly with native client or native (not Spark) JDBC connection, and use interact with it from there.

这篇关于在scala/spark代码中不允许在配置单元中添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆