用于将数据从RDMBS导入HDFS的Apache NiFi-与SQOOP的性能比较 [英] Apache NiFi For Importing Data From RDMBS to HDFS - Performance Comparison with SQOOP

查看:56
本文介绍了用于将数据从RDMBS导入HDFS的Apache NiFi-与SQOOP的性能比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在探索Apache NiFi作为满足我们企业需求的通用数据提取工具.

We are exploring Apache NiFi as a general purpose data ingestion tool for our enterprise requirements.

一个典型的数据摄取要求是从RDBMS移出数据系统到HDFS.

One typical data ingestion requirement is moving data from RDBMS systems to HDFS.

我能够使用NiFi提供的GenerateTableFetch和ExecuteSQL处理器在NiFi中构建RDBMS到HDFS数据移动流,并且对于较小的表来说一切正常.

I was able to build RDBMS to HDFS data movement flow in NiFi using GenerateTableFetch and ExecuteSQL Processors provided by NiFi and everything worked fine for smaller tables.

但是,由于我使用的是独立发行版,因此无法测试较大表的流程.

But, I couldn't test the flow for bigger tables as I was using a standalone distribution.

有人针对类似要求对NiFi和SQOOP进行了性能比较吗?

Has anyone done a performance comparison of NiFi with SQOOP for similar requirements ?

推荐答案

ExecuteSQL ExecuteSQLRecord 是更好的选择.前者将自动将结果集转换为Avro序列.后者使您可以更自由地编写输出(JSON,CSV等).关于 ExecuteSQL 的一件好事是,您可以将其与 MergeRecord 链接起来,以将多个大小适中的结果页组合成更大的数据块和 MergeRecord 可以使用 ParquetWriter 为您提供现成的Parquet,以便插入HDFS.

ExecuteSQL and ExecuteSQLRecord are a better choice. The former will just automatically convert result sets into an Avro sequence. The latter gives you more freedom on how you write the output (JSON, CSV, etc.). One nice thing about ExecuteSQL is that you can chain it with MergeRecord to combine multiple modest-sized result pages into a much bigger block of data, and MergeRecord can use the ParquetWriter to give you ready-made Parquet for insertion into HDFS.

这篇关于用于将数据从RDMBS导入HDFS的Apache NiFi-与SQOOP的性能比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆