将Sqoop导入为OrC文件 [英] Sqoop import as OrC file

查看:2348
本文介绍了将Sqoop导入为OrC文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

sqoop中有任何选项可以从RDMS导入数据并将其作为ORC文件格式存储在HDFS中?



尝试的替代方法:导入为文本格式并使用临时文件表读取输入为文本文件并将hdfs写入配置单元中的orc

至少在Sqoop 1.4.5中存在hcatalog整合,支持orc文件格式(等)。



例如,您可以选择

   -  hcatalog-storage-stanza 

可以设置为

 储存为orc tblproperties(orc.compress=SNAPPY)


  sqoop import 
--connect jdbc:postgresql:// foobar:5432 / my_db
--driver org.postgresql.Driver
--connection-manager org.apache.sqoop.manager.GenericJdbcManager
--username foo
--password-file hdfs:///user/foobar/foo.txt
--table fact
--hcatalog-ho me / usr / hdp / current / hive-webhcat
--hcatalog-database my_hcat_db
--hcatalog-table fact
--create-hcatalog-table
--hcatalog-存储为orc tblproperties(orc.compress=SNAPPY)'


Is there any option in sqoop to import data from RDMS and store it as ORC file format in HDFS?

Alternatives tried: imported as text format and used a temp table to read input as text file and write to hdfs as orc in hive

解决方案

At least in Sqoop 1.4.5 there exists hcatalog integration that support orc file format (amongst others).

For example you have the option

--hcatalog-storage-stanza

which can be set to

stored as orc tblproperties ("orc.compress"="SNAPPY")

Example:

sqoop import 
 --connect jdbc:postgresql://foobar:5432/my_db 
 --driver org.postgresql.Driver 
 --connection-manager org.apache.sqoop.manager.GenericJdbcManager 
 --username foo 
 --password-file hdfs:///user/foobar/foo.txt 
 --table fact 
 --hcatalog-home /usr/hdp/current/hive-webhcat 
 --hcatalog-database my_hcat_db 
 --hcatalog-table fact 
 --create-hcatalog-table 
 --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'

这篇关于将Sqoop导入为OrC文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆