如何将数据从大型机拉到 Hadoop [英] How to pull data from Mainframe to Hadoop

查看:36
本文介绍了如何将数据从大型机拉到 Hadoop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在大型机中有文件.我希望将这些数据推送到 Hadoop(HDFS)/HIVE.

我可以将 Sqoop 用于 Mainframe DB2 数据库并将其导入到 HIVE,但是文件(如COBOLVASM 等)呢

是否有任何我可以编写的自定义水槽源或一些替代工具可以在这里使用?

解决方案

COBOL 是一种编程语言,而不是一种文件格式.如果您需要导出 COBOL 程序生成的文件,您可以使用与这些文件由 C、C++、Java、Perl、PL/I、Rexx 等生成的文件相同的技术.

通常,您将拥有三种不同的数据源:平面文件、VSAM 文件和 DBMS(例如 DB2 或 IMS).

DMBS 具有导出实用程序,可将数据复制到平面文件中.请记住,DB2 中的数据可能会被规范化,因此您可能需要相关表的内容来理解数据.

VSAM 文件可以通过 IDCAMS 实用程序.

我强烈建议您将文件转换为文本格式,然后再将它们传输到具有不同代码页的另一个框.尝试处理混合文本(必须翻译其代码页)和二进制(不得翻译其代码页但可能必须从大端转换为小端)比预先进行转换更难.

可以通过大型机上的 SORT 实用程序完成转换.大型机 SORT 实用程序往往具有广泛的数据操作功能.您还可以使用其他机制(其他实用程序、以您选择的语言编写的自定义代码、购买的软件包),但这是我们在这些情况下倾向于做的.

一旦您将平面文件转换为所有数据都是文本,您就可以通过 FTP、SFTP 或 FTPS 将它们传输到您的 Hadoop 设备.

这不是对该主题的详尽介绍,但可以帮助您入门.

I have files in Mainframe. I want these data to be pushed to Hadoop(HDFS)/HIVE.

I can use Sqoop for the Mainframe DB2 database and import it to HIVE, but what about files (like COBOL,VASM etc.)

Is there any custom flume source that I can write or some alternative tool to use here?

解决方案

COBOL is a programming language, not a file format. If what you need is to export files produced by COBOL programs, you can use the same technique as if those files were produced by C, C++, Java, Perl, PL/I, Rexx, etc.

In general, you will have three different data sources: flat files, VSAM files, and a DBMS such as DB2 or IMS.

DMBSs have export utilities to copy the data into flat files. Keep in mind that data in DB2 will likely be normalized and thus you likely need the contents of related tables in order to make sense of the data.

VSAM files can be exported to flat files via the IDCAMS utility.

I would strongly suggest you get the files into a text format before transferring them to another box with a different code page. Trying to deal with mixed text (which must have its code page translated) and binary (which must not have its code page translated but which likely must be converted from big endian to little endian) is harder than doing the conversion up front.

The conversion can likely be done via the SORT utility on the mainframe. Mainframe SORT utilities tend to have extensive data manipulation functions. There are other mechanisms you could use (other utilities, custom code written in the language of your choice, purchased packages) but this is what we tend to do in these circumstances.

Once you have your flat files converted such that all data is text, you can transfer them to your Hadoop boxes via FTP or SFTP or FTPS.

This isn't an exhaustive coverage of the topic, but it will get you started.

这篇关于如何将数据从大型机拉到 Hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆