如何在Spark中读取Orc Transaction Hive表? [英] how to read orc transaction hive table in spark?
问题描述
-
如何在Spark中读取Orc事务配置单元表?
how to read orc transaction hive table in spark?
在通过读取ORC事务表时遇到问题 火花我得到蜂巢表的架构,但无法读取实际数据
I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data
查看完整的场景:
hive>创建表默认值 (id)放入2个存储为ORC TBLPROPERTIES的存储桶中 ('transactional'='true');
hive> create table default.Hello(id int,name string) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
hive>插入default.hello值(10,'abc');
hive> insert into default.hello values(10,'abc');
现在我正在尝试从Spark sql访问Hive Orc数据,但它显示 唯一模式
now I am trying to access Hive Orc data from Spark sql but it show only schema
spark.sql(选择*从你好").show()
spark.sql("select * from hello").show()
输出:id,名称
推荐答案
是的,可以使用压缩作为解决方法,但是当工作是微型批处理时,压缩将无济于事.所以我决定使用JDBC调用.请在下面的链接中参考我对这个问题的答案,或者参考我的GIT页面- https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID
Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID
这篇关于如何在Spark中读取Orc Transaction Hive表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!