如何使用Scala更新ORC Hive表表单Spark的数据 [英] How to Updata an ORC Hive table form Spark using Scala

查看:285
本文介绍了如何使用Scala更新ORC Hive表表单Spark的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想更新一个orc格式的配置单元表,我可以从ambari配置单元视图进行更新,但是无法从sacla(spark-shell)运行相同的更新语句

I would like to update a hive table which is in orc format , I'm able to update from my ambari hive view, but unable to run same update statement from sacla (spark-shell)

objHiveContext.sql("select * from table_name")能够查看数据,但是当我运行

objHiveContext.sql("select * from table_name ") able to see data but when I run

objHiveContext.sql("update table_name set column_name ='testing'")无法运行,正在发生一些异常的异常(更新等附近的语法无效),因为我可以从Ambari视图进行更新(当我设置了所有所需的配置,即TBLPROPERTIES"orc.compress" ="NONE"事务性true等)

objHiveContext.sql("update table_name set column_name='testing' ") unable to run , some Noviable exception(Invalid syntax near update etc) is occurring where as I'm able to update from Ambari view(As I set all the required configurations i.e TBLPROPERTIES "orc.compress"="NONE" transactional true etc)

尝试使用Insert插入用例语句,但几乎无法 我们可以从spark更新配置单元ORC表吗? 如果是,那么程序是什么?

Tried with Insert into using case statements and all but couldn't Can we UPDATE hive ORC tables from spark? If yes then what is the procedure ?

在下面导入

import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.orc._

注意:我没有对该表应用任何分区或存储分区 如果我应用存储桶,当存储为ORC时,我什至无法查看数据 蜂巢版本:1.2.1 Spark版本:1.4.1 Scala版本:2.10.6

Note: I didn't apply any partition or bucketing on that table If I apply bucketing I'm even unable to view data when stored as ORC Hive Version:1.2.1 Spark version:1.4.1 Scala Version :2.10.6

推荐答案

您是否使用SaveMode尝试过DataFrame.write API.请按下面的链接附加吗?

Have you tried the DataFrame.write API using SaveMode.Append per the link below?

http://spark.apache. org/docs/latest/sql-programming-guide.html#manually-specifying-options

使用"orc"作为格式,使用"append"作为保存模式.示例在上面的链接中.

use "orc" as the format and "append" as the save mode. examples are in that link above.

这篇关于如何使用Scala更新ORC Hive表表单Spark的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆