重新解析存储在由Sqoop从Oracle导入的HDFS中的Blob数据 [英] Re-parsing Blob data stored in HDFS imported from Oracle by Sqoop

查看:929
本文介绍了重新解析存储在由Sqoop从Oracle导入的HDFS中的Blob数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Sqoop我已经从一个有BLOB列的表中成功导入了几行。现在part-m-00000文件包含所有记录以及作为CSV的BLOB字段。



问题:根据文档,关于Sqoop特定格式的知识可以帮助读取这些blob记录。
那么,Sqoop特定的格式意味着什么?



2)基本上,blob文件是一个文本文件的.gz文件,其中包含一些浮点数据。这些.gz文件作为blob存储在Oracle数据库中,并使用Sqoop导入到HDFS中。那么我怎么能够从HDFS文件中找回那些浮点数据。
任何示例代码都会非常有用。

解决方案

我看到了这些选项。


  1. Sqoop从Oracle直接导入到具有二进制数据类型的hive表。这个选项可能会限制像MR,猪等蜂房之外的处理能力。也就是说,您可能需要了解blob如何以二进制形式存储在配置单元中等等。您在问题1中描述的相同限制。


  2. Sqoop从oracle导入到avro,可以保存二进制文件的序列或orc文件格式。你应该能够通过在其上创建一个配置单元外部表来读取它。您可以编写配置单元UDF来解压缩二进制数据。这个选项更加灵活,因为数据可以通过MR轻松处理,尤其是avro序列文件格式。


希望这有助于。你是如何解决的?


Using Sqoop I’ve successfully imported a few rows from a table that has a BLOB column.Now the part-m-00000 file contains all the records along with BLOB field as CSV.

Questions:

1) As per doc, knowledge about the Sqoop-specific format can help to read those blob records. So , What does the Sqoop-specific format means ?

2) Basically the blob file is .gz file of a text file containing some float data in it. These .gz file is stored in Oracle DB as blob and imported into HDFS using Sqoop. So how could I be able to get back those float data from HDFS file. Any sample code will of very great use.

解决方案

I see these options.

  1. Sqoop Import from Oracle directly to hive table with a binary data type. This option may limit the processing capabilities outside hive like MR, pig etc. i.e. you may need to know the knowledge of how the blob gets stored in hive as binary etc. The same limitation that you described in your question 1.

  2. Sqoop import from oracle to avro, sequence or orc file formats which can hold binary. And you should be able to read this by creating a hive external table on top of it. You can write a hive UDF to decompress the binary data. This option is more flexible as the data can be processed easily with MR as well especially the avro, sequence file formats.

Hope this helps. How did you resolve?

这篇关于重新解析存储在由Sqoop从Oracle导入的HDFS中的Blob数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆